7
A New Bi-clustering Approach Using Topological Maps Amine Chaibi Mustapha Lebbah Hanane Azzag Abstract—In this paper, we propose a new bi-clustering algo- rithm based on self-organizing maps titled BiTM (Bi-clustering using Topological Map). BiTM provides a simultaneous clustering of rows and columns of the data matrix in order to increase the homogeneity of bi-clusters by respecting neighborhood re- lationship and using a single map. BiTM maps provide a new topological visualization of the bi-clusters. Experimental results and comparison studies show that BiTM improves the results in term of bi-clustering and visualization. Index Terms — Bi-clustering, self-organizing maps, visualization I. I NTRODUCTION B I-CLUSTERING methods have become an interest topic due to its several applications in data mining. It has successful applications in gene expression data analysis. Bi- clustering paradigm seeks to find simultaneously sub-matrices or blocks, that represent clusters of rows and clusters of columns. The term bi-clustering was first used by Cheng and Church [4] in gene expression data analysis. Terms such as co- clustering, bidimensional clustering and subspace clustering, among others, are often used in the literature to refer to the same problem formulation. Different formulations of the bi- clustering problem have been proposed, such as partitioning model [13], bayesian bi-clustering [19], spectral analysis [12], greedy [1], exhaustive enumeration [22], self-organizing [2]. In the direct clustering approach (Block Clustering) [13], the data matrix is divided into several sub-matrices corresponding to blocks. The division of a block depends on the variance of its values. Indeed, more the variance is low, more the block is constant. Furthermore, in [13], author proposes two other algorithms: the first one "One-Way Splitting" is based on clustering the observations with features that have an intra-class variance greater than a given threshold in order to split the associated class. The second approach named ”Two-Way Splitting” di- vides rows and columns, and computes for each iteration a large number of variance. Using the same idea, in [8] authors present a coupled two-way clustering (CTWC) approach, which uses Super Paramagnetic Clustering (SPC) method [9] to cluster columns using all rows and vice versa. Recently, a bi-clustering approaches based on matrix de- composition formulation have been proposed as in [17], [15]. In [17], authors propose a method named NBVD, which Amine Chaibi, Mustapha Lebbah and Hanane Azzag are with LIPN-UMR 7030. University of Paris 13 - Sorbonne City av. J-B Clement F-93430 Villetaneuse fi[email protected] factorizes the data matrix into three components: the row coefficient matrix, the block value matrix and the column coefficient matrix. In [15], authors propose an approach named "Co-clustering Under Nonnegative Matrix Tri-Factorization" (CUNMTF), which generalizes the idea of NMF [16] to factorize the original matrix into three nonnegative matrices. In this paper, we focus on the topological partitioning- based bi-clustering formulation, which optimizes an objective function measuring the quality of bi-clustering results. Our ap- proach are based on self-organizing map [14] and bi-clustering method Croeuc proposed in [10]. The author of [10] defines three algorithms for continuous, binary and contingency tables that proceed by optimizing partitions of rows and columns using an iterative k-means procedure. In [15], authors prove that the double k-means is equivalent to algebraic problem of NMF under some suitable constraints. Other probabilistic model-based have been also proposed in [11], [18]. Furthermore, there are some other bi-clustering methods based on self-organizing maps (SOM) such as DCC (Double Conjugated Clustering) [3] and KDISJ (Kohonen for Disjunc- tive Table) [5]. The drawback of DCC method is the use of two maps (map of observations and map of features). These maps are built independently with the same size. Concerning KDISJ, it is only dedicated to categorical data. In this paper, we propose to set the problem of bi-clustering using SOM like learning step, which provides intuitive inter- pretation the bi-clustering structure in the sub-matrices orga- nized in topological map. The first contribution of this paper is the proposition of a new topological map algorithm dedicated to bi-clustering applied on data not necessarily reorganized using only a single map. The second contribution consist of providing a new visualization of bi-clustering results. The rest of this paper is organized as follows: in section II, we present the model and algorithms, section III is devoted to the methodology and experimental results. Finally, we draw some conclusions and outline future work in section IV. II. TOPOLOGICAL BI - CLUSTERING:BI TM MODEL As traditional self-organizing maps which is increasingly used as tools for clustering and visualization, BiTM (Bi- clustering using Topological Maps) consists of a discrete set of cells C called map with size K. This map has a discrete topology defined as an undirected graph, it is usually a regular grid in 2 dimensions. For each pair of cells (c,r) on the map, the distance δ(c, r) is defined as the length of the shortest chain linking cells r and c on the grid. For each cell c this

[IEEE 2013 International Joint Conference on Neural Networks (IJCNN 2013 - Dallas) - Dallas, TX, USA (2013.08.4-2013.08.9)] The 2013 International Joint Conference on Neural Networks

  • Upload
    hanane

  • View
    214

  • Download
    2

Embed Size (px)

Citation preview

Page 1: [IEEE 2013 International Joint Conference on Neural Networks (IJCNN 2013 - Dallas) - Dallas, TX, USA (2013.08.4-2013.08.9)] The 2013 International Joint Conference on Neural Networks

A New Bi-clustering Approach Using TopologicalMaps

Amine Chaibi Mustapha Lebbah Hanane Azzag

Abstract—In this paper, we propose a new bi-clustering algo-rithm based on self-organizing maps titled BiTM (Bi-clusteringusing Topological Map). BiTM provides a simultaneous clusteringof rows and columns of the data matrix in order to increasethe homogeneity of bi-clusters by respecting neighborhood re-lationship and using a single map. BiTM maps provide a newtopological visualization of the bi-clusters. Experimental resultsand comparison studies show that BiTM improves the results interm of bi-clustering and visualization.Index Terms — Bi-clustering, self-organizing maps, visualization

I. INTRODUCTION

B I-CLUSTERING methods have become an interest topicdue to its several applications in data mining. It has

successful applications in gene expression data analysis. Bi-clustering paradigm seeks to find simultaneously sub-matricesor blocks, that represent clusters of rows and clusters ofcolumns. The term bi-clustering was first used by Cheng andChurch [4] in gene expression data analysis. Terms such as co-clustering, bidimensional clustering and subspace clustering,among others, are often used in the literature to refer to thesame problem formulation. Different formulations of the bi-clustering problem have been proposed, such as partitioningmodel [13], bayesian bi-clustering [19], spectral analysis [12],greedy [1], exhaustive enumeration [22], self-organizing [2].In the direct clustering approach (Block Clustering) [13], thedata matrix is divided into several sub-matrices correspondingto blocks. The division of a block depends on the variance ofits values. Indeed, more the variance is low, more the block isconstant.

Furthermore, in [13], author proposes two other algorithms:the first one "One-Way Splitting" is based on clustering theobservations with features that have an intra-class variancegreater than a given threshold in order to split the associatedclass. The second approach named ”Two-Way Splitting” di-vides rows and columns, and computes for each iteration alarge number of variance. Using the same idea, in [8] authorspresent a coupled two-way clustering (CTWC) approach,which uses Super Paramagnetic Clustering (SPC) method [9]to cluster columns using all rows and vice versa.

Recently, a bi-clustering approaches based on matrix de-composition formulation have been proposed as in [17], [15].In [17], authors propose a method named NBVD, which

Amine Chaibi, Mustapha Lebbah and Hanane Azzag are with LIPN-UMR7030. University of Paris 13 - Sorbonne City av. J-B Clement F-93430Villetaneuse [email protected]

factorizes the data matrix into three components: the rowcoefficient matrix, the block value matrix and the columncoefficient matrix. In [15], authors propose an approach named"Co-clustering Under Nonnegative Matrix Tri-Factorization"(CUNMTF), which generalizes the idea of NMF [16] tofactorize the original matrix into three nonnegative matrices.

In this paper, we focus on the topological partitioning-based bi-clustering formulation, which optimizes an objectivefunction measuring the quality of bi-clustering results. Our ap-proach are based on self-organizing map [14] and bi-clusteringmethod Croeuc proposed in [10]. The author of [10] definesthree algorithms for continuous, binary and contingency tablesthat proceed by optimizing partitions of rows and columnsusing an iterative k-means procedure. In [15], authors provethat the double k-means is equivalent to algebraic problemof NMF under some suitable constraints. Other probabilisticmodel-based have been also proposed in [11], [18].

Furthermore, there are some other bi-clustering methodsbased on self-organizing maps (SOM) such as DCC (DoubleConjugated Clustering) [3] and KDISJ (Kohonen for Disjunc-tive Table) [5]. The drawback of DCC method is the use oftwo maps (map of observations and map of features). Thesemaps are built independently with the same size. ConcerningKDISJ, it is only dedicated to categorical data.

In this paper, we propose to set the problem of bi-clusteringusing SOM like learning step, which provides intuitive inter-pretation the bi-clustering structure in the sub-matrices orga-nized in topological map. The first contribution of this paper isthe proposition of a new topological map algorithm dedicatedto bi-clustering applied on data not necessarily reorganizedusing only a single map. The second contribution consist ofproviding a new visualization of bi-clustering results.

The rest of this paper is organized as follows: in section II,we present the model and algorithms, section III is devoted tothe methodology and experimental results. Finally, we drawsome conclusions and outline future work in section IV.

II. TOPOLOGICAL BI-CLUSTERING: BITM MODEL

As traditional self-organizing maps which is increasinglyused as tools for clustering and visualization, BiTM (Bi-clustering using Topological Maps) consists of a discrete setof cells C called map with size K. This map has a discretetopology defined as an undirected graph, it is usually a regulargrid in 2 dimensions. For each pair of cells (c,r) on the map,the distance δ(c, r) is defined as the length of the shortestchain linking cells r and c on the grid. For each cell c this

Page 2: [IEEE 2013 International Joint Conference on Neural Networks (IJCNN 2013 - Dallas) - Dallas, TX, USA (2013.08.4-2013.08.9)] The 2013 International Joint Conference on Neural Networks

distance defines a neighbor cell. Let <d be the euclideandata space and D the matrix of data, where each observationxi = (x1i , x

2i , ..., x

ji , .., x

di ) is a vector in <d. The set of rows

(observations) is denoted by I = {1, ..., N}. Similarly, the setof columns (features) is denoted by J = {1, ...., d}. We areinterested in simultaneously clustering observation I into Kclusters {P1, P2, ..., Pk, .., PK}, and features J into L clusters{Q1, Q2, ..., Ql, .., QL}.

The main purpose of BiTM is to transform a data matrixD into a block structure organized in topological map. InBiTM, each cell k ∈ C is associated with a prototype gk =(g1k, g

2k..., g

lk, ..., g

Lk ), where L < d and glk ∈ <. To facilitate

formulation, we define two binary matrices Z = (zik) andW = (wjl) to save the assignment associated respectively toobservations and features:

zik =

{1 if xi ∈ Pk, k = φz(xi)0 else

wjl =

{1 if xj ∈ Ql, l = φw(x

j)0 else

We denote by φz the assignment function of row (observation)and φw the assignment function of column (feature). Hence,we can detect the block or bi-cluster of data denoted byBlk = {(xij |zik × wjl = 1}. Thus, using topological maps,we propose to minimize the new following cost function:

R̃(φw, φz, G) =K∑k=1

L∑l=1

∑xi∈Pk

∑xj∈Ql

K∑r=1

KT (δ(r, k))

× ‖ xji − glr‖2

Where

• G = {g1, .....,gk} denotes the set of prototype,• The term KT (δ(r, k)) denotes the neighborhood function,• T presents the temperature that controls the radius of the

neighborhood.• In practice, we use the neighborhood function defined asKT (δ(c, r)) = exp

(−δ(c,r)T

).

The minimization of R̃(φw, φz, G) is run by iteratively per-forming 4 steps until stabilization. The main phases of BiTMalgorithm are presented as follows:

Algorithm 1 : BiTM Algorithm1: Inputs:

• The data D = {xji}i=1..N,j=1..d.• Assignment matrix Z,W .• Initialized the set of prototypes G associated to map.• tmax : the maximum number of iterations.

2: Outputs:• Assignment matrix Z,W .• Updated prototypes G.

3: Observation assignment phase: Each observation xi isassigned to the closest prototype gk using the assignmentfunction, defined as follows:

φz(xi) = argminc

d∑j=1

m∑l=1

K∑r=1

wjl ×KT (δ(r, c))

× ‖ xji − glr‖2

4: Quantization phase: The prototype vectors are updatedusing following expression:

glk =

∑Ni=1

∑dj=1KT (δ(k, φz(xi))× wjl × x

ji∑N

i=1

∑dj=1KT (δ(k, φz(xi)))× wjl

5: Features assignment phase: Each feature xj is assignedto the closest prototype glk using the assignment function,defined as follows:

φw(xj) = argmin

l

N∑i=1

K∑k=1

K∑r=1

zik ×KT (δ(r, k))

× ‖ xji − glr‖2

6: Quantization phase: The prototype vectors are updatedusing following expression:

glk =

∑Ni=1

∑dj=1KT (δ(k, φz(xi)))× wjl × x

ji∑N

i=1

∑dj=1KT (δ(k, φz(xi)))× wjl

Repeat phases 3, 4, 5 and 6 until t = tmax.

The BiTM training is usually performed in two phases. Inthe first phase, large initial neighborhood radius are used withTmax. In the second phase, the neighborhoods are small rightform the beginning. The temperature T varies according to theiterations from Tmax to Tmin in the same way as in traditionaltopological maps.

A. Topological order in BiTM Model

The decomposition of the cost function R̃ that depends onthe value of T , permits to rewrite its expression as follows:

R̃(φw, φz, G) =K∑k=1

L∑l=1

∑xi∈Pk

∑xj∈Ql

K∑r=1,r 6=k

KT (δ(r, k))

× ‖ xji − glr‖2

+ KT (0)K∑r=1

L∑l=1

∑xi∈Pk

∑xj∈Ql

‖ xji − glr‖2

Page 3: [IEEE 2013 International Joint Conference on Neural Networks (IJCNN 2013 - Dallas) - Dallas, TX, USA (2013.08.4-2013.08.9)] The 2013 International Joint Conference on Neural Networks

The cost function R̃ is decomposed into two terms. In orderto maintain the topological order between blocks, minimizingthe first term will bring block corresponding to two neigh-boring cells. Indeed, if c and r are neighbors on the map,the value of δ(r, k) is low and in this case the value ofKT (δ(r, k)) is high. Thus, minimizing the first term has aseffect reducing the value of the cost function. Minimizing thesecond term corresponds to the minimization of the inertia oflocal data assigned to block Bjr , j = 1..L. For different valuesof temperature T , each term of the cost function has a relativerelevance in the minimization process. We can define two stepsin the operating of the algorithm:

• The first step corresponds to high T values where thefirst term is dominant and in this case, the priority is topreserve the topology.

• The second step corresponds to small T where the secondterm is considered in the cost function. Therefore, theadaptation is very local and BiTM algorithm converge toalgorithm.

III. EXPERIMENTS

We present in this section a set of experiments on severaldatasets extracted from UCI repository [7]. Table III listspublic dataset, the number of real class and the map size usedin the learning phase.

dataset # observations # features Map size # real classes

isolet5 1559 617 12×12 26

Movement Libras 45 90 5× 5 15

Breast 699 10 7×7 2

Sonar mines 208 60 6×6 2

Lung Cancer 32 56 4×4 2

Spectf 1 349 44 4×4 2

Cancer Wpbc Ret 198 33 6×6 2

Horse Colic 300 27 5×5 2

Heart 270 13 5×5 2

glass 214 9 5×5 7

TABLE IREAL DATASETS DESCRIPTION.

To evaluate the clustering performance, three criterion areused, each of them should be maximized: Accuracy (Acc),Rand and Normalized Mutual Information (NMI) measure[21]. To compute these criterion, we consider a set of Nobjects of L classes classified into K clusters and two par-titions to compare: X = {x1, ..,xN} where xk ∈ [C1..CK ]a random variable for cluster assignments, Y = {y1, .., yN}where yl = [B1..BL] a random variable for the pre-existinglabels. Hence, the contingency table can be expressed as intable II.

The Accuracy (Acc) is defined as follows:

Acc =

∑kmaxl=[1..L](nlk)

N(1)

R\C C1 C2 · · · Ck · · · CK SumB1 n11 n12 · · · n1k · · · n1K N1∗B2 n21 n22 · · · n2k · · · n2K N2∗

......

.... . .

.... . .

......

Bl nl1 nl2 · · · nlk · · · nlK Nl∗...

......

. . ....

. . ....

...BL nL1 nL2 · · · nLk · · · nLK NL∗Sum N∗1 N∗2 N∗k N∗K N

TABLE IICONTINGENCY TABLE

The Normalized Mutual Information (NMI) is given by thefollowing expression:

NMI =

∑l

∑k nlk log2(

N.nl,k

Nl∗N∗k)

(∑lNl∗log2(

Nl∗N )(

∑kN∗klog2(

N∗kN )))

(2)

The Rand measure is computed as follows:

Rand = (N00 +N11)/(

N2

)(3)

Where• N11 is the number of pairs that are in the same cluster

in both B and C• N00 is the number of pairs that are in different clusters

in both B and C

A. Comparison between BiTM and clustering approaches

Firstly, we compare the results of BiTM approach againstclassical SOM (Self Organizing Map) [14] using the SOMtoolbox, HCL (Hierarchical Clustering ) [6] using the BicATsoftware (http://www.tik.ee.ethz.ch/sop/bicat/), NMF (Non-negative matrix factorization) [20] and against ONMTF (Or-thogonal Nonnegative Matrix Tri-Factorization) [17]. The ob-jective here is to not get better performance than clusteringapproaches, but to show that BiTM does not interfere SOMand provides an equivalent performances as a clustering ap-proaches. Tables III, IV, V present the results obtained on thepurity (Acc), Rand and NMI measures.

Table III depicts the ACC results obtained with BiTM,SOM, HCL, NMF and ONMTF. Our method BiTM providesthe best or equivalent results in the majority of datasets. Alow decrease is observed in HorseColic, Cancer Wpbc Retand glass datasets. In isolet5, our method obtains 0.316 whenSOM provides 0.433, HCL 0.427, NMF 0.107 and ONMTF0.382. This is due to the high number of features (617).

Table IV depicts a Rand index comparison between BiTMwith SOM, HCL, NMF and ONMTF. BiTM provides a bet-ter or equivalent Rand measure in isolet5, Movement libra,Breast, Sonar mines, HorseColic and Heart. Although ourmethod is less efficient in LungCancer, Spectf 1, Cancer WpbcRet and glass, and BiTM remains stable. For example, inMovementLibras, the highest value of rand measure is givenby SOM 0.943 where BiTM obtains 0.937. However, HCLobtains 0.817, NMF 0.789 and ONMTF 0.81. In isolet5 BiTMobtains 0.926, where SOM provides 0.905, HCL 0.812, NMF

Page 4: [IEEE 2013 International Joint Conference on Neural Networks (IJCNN 2013 - Dallas) - Dallas, TX, USA (2013.08.4-2013.08.9)] The 2013 International Joint Conference on Neural Networks

0.471 and ONMTF 0.475. We observe that BiTM is the moststable method in term of Rand measure.

The NMI results are presented in table V, which showsthat BiTM, SOM and HCL provides equivalent performances.Our method is better or equivalent than SOM approachin MovementLibras, Breast, LungCancer and Heart datasets.HCL obtains a very large decrease of NMI in the majority ofdatasets except for isolet5, MovementLibras, LungCancer andglass. Despite a low decrease of NMI in some datasets, ourapproach provides a stable results.

dataset BiTM SOM HCL NMF ONMTF

isolet5 0.316 0.433 0.427 0.107 0.382

MovementLibras 0.712 0.711 0.711 0.288 0.311

Breast 0.978 0.974 0.633 0.804 0.821

Sonar mines 0.769 0.744 0.727 0.601 0.649

LungCancer 1 0.906 0.743 0.781 0.875

Spectf 1 0.759 0.716 0.65 0.73 0.727

Cancer Wpbc Ret 0.787 0.828 0.722 0.762 0.762

HorseColic 0.719 0.78 0.713 0.67 0.67

Heart 0.883 0.851 0.755 0.62 0.637

glass 0.618 0.623 0.72 0.481 0.472

TABLE IIIACC COMPARISONS OF VARIOUS CLUSTERING ALGORITHM WITH BITM.

dataset BiTM SOM HCL NMF ONMTF

isolet5 0.926 0.905 0.812 0.471 0.475

MovementLibras 0.937 0.943 0.817 0.789 0.81

Breast 0.687 0.476 0.499 0.545 0.648

Sonar mines 0.508 0.507 0.489 0.504 0.512

LungCancer 0.459 0.425 0.427 0.487 0.542Spectf 1 0.418 0.403 0.499 0.436 0.43

Cancer Wpbc Ret 0.435 0.372 0.54 0.417 0.408

HorseColic 0.472 0.448 0.449 0.462 0.488Heart 0.56 0.529 0.512 0.502 0.506

glass 0.653 0.752 0.348 0.689 0.693

TABLE IVRAND COMPARISONS OF VARIOUS CLUSTERING ALGORITHM WITH BITM.

dataset BiTM SOM HCL NMF ONMTF

isolet5 0.439 0.584 0.562 0.007 0.015

MovementLibras 0.811 0.797 0.555 0.57 0.597

Breast 0.53 0.364 0.003 0.193 0.226

Sonar mines 0.158 0.233 0.001 0.026 0.047

LungCancer 0.461 0.344 0.295 0.111 0.244

Spectf 1 0.1449 0.185 0.01 0.025 0.027

Cancer Wpbc Ret 0.081 0.14 0.005 0.014 0.017

HorseColic 0.06 0.128 0.009 0.03 0.027

Heart 0.247 0.225 0.06 0.04 0.036

glass 0.125 0.463 0.153 0.231 0.244

TABLE VNMI COMPARISONS OF VARIOUS CLUSTERING ALGORITHM WITH BITM.

B. Comparison between BiTM and bi-clustering approaches

For purposes of comparisons with the other bi-clusteringapproaches, we have selected three methods: The first oneis CTWC (Coupled Two-Way Clustering) [8]. The licensesoftware is provided by Pr. Assif Yitzhaky and Pr. Ey-tan Domany. The second approach is NBVD (Non-negativeBlock Value Decomposition) [17]. Finally, we compare BiTMwith CUNMTF (Co-clustering Under Nonnegative Matrix Tri-Factorization) [15].

Detailed results are shown in table VI, VII and VIII. Weobserve that CTWC did not gives results in MovementLibrasdatabase.

Table VI lists the ACC results. We observe that for alldatasets, BiTM provides the highest values. In most dataset,we observe a large difference between our approach andothers. For example, in MovementLibras, BiTM obtains 0.712,NBVD 0.33 and CUNMTF 0.333. We observe also a difficul-ties to obtain a high value with isolet5.

Experimental results of the Rand measure are shown in tableVII. The performances of BiTM are better or equivalent inmost datasets. Except for LungCancer, Spectf 1 and CancerWpbc Ret datasets where CTWC is more efficient.

Table VIII presents experimental results obtained withBiTM, CTWC, NBVD and CUNMTF on NMI index. Ourmethod BiTM provides the highest performances of NMIindex in all datasets except for glass dataset.

dataset BiTM CTWC NBVD CUNMTF

isolet5 0.316 0.103 0.073 0.293

MovementLibras 0.712 NaN 0.33 0.333

Breast 0.978 0.655 0.834 0.834

Sonar mines 0.769 0.548 0.644 0.634

LungCancer 1 0.718 0.875 0.843

Spectf 1 0.759 0.727 0.727 0.727

Cancer Wpbc Ret 0.787 0.762 0.762 0.762

HorseColic 0.719 0.67 0.67 0.673

Heart 0.883 0.555 0.674 0.674

glass 0.618 0.523 0.462 0.462

TABLE VIACC COMPARISONS OF VARIOUS BI-CLUSTERING ALGORITHM WITH

BITM.

Fig. 1. Results obtained with BiTM, CTWC, NBVD and CUNMTF on purityindex using Spider plot.

Page 5: [IEEE 2013 International Joint Conference on Neural Networks (IJCNN 2013 - Dallas) - Dallas, TX, USA (2013.08.4-2013.08.9)] The 2013 International Joint Conference on Neural Networks

dataset BiTM CTWC NBVD CUNMTF

isolet5 0.926 0.91 0.502 0.508

MovementLibras 0.937 NaN 0.845 0.84

Breast 0.687 0.505 0.659 0.688Sonar mines 0.508 0.502 0.514 0.508LungCancer 0.459 0.556 0.556 0.536

Spectf 1 0.418 0.513 0.42 0.42

Cancer Wpbc Ret 0.435 0.524 0.414 0.417

HorseColic 0.472 0.463 0.46 0.459

Heart 0.56 0.498 0.513 0.515

glass 0.653 0.69 0.693 0.69

TABLE VIIRAND COMPARISONS OF VARIOUS BI-CLUSTERING ALGORITHM WITH

BITM.

Fig. 2. Results obtained with BiTM, CTWC, NBVD and CUNMTF on randindex using Spider plot.

dataset BiTM CTWC NBVD CUNMTF

isolet5 0.439 0.077 0.137 0.186

MovementLibras 0.811 NaN 0.688 0.667

Breast 0.53 0.01 0.243 0.0233

Sonar mines 0.158 0.006 0.057 0.04

LungCancer 0.461 0.041 0.309 0.261

Spectf 1 0.1449 0.001 0.016 0.016

Cancer Wpbc Ret 0.081 0.034 0.024 0.031

HorseColic 0.06 0.003 0.03 0.028

Heart 0.247 0.001 0.063 0.061

glass 0.125 0.38 0.246 0.245

TABLE VIIINMI COMPARISONS OF VARIOUS BI-CLUSTERING ALGORITHM WITH

BITM

Fig. 3. Results obtained with BiTM, CTWC, NBVD and CUNMTF on NMIindex using Spider plot.

C. Visualization task

In this work, we consider that it is very important to makea visual evaluation of the proposed approach. Self-OrganizingMaps give easier way for analysts to understand the datasets.Figures 4, 5, 6 and 7 show the visual results obtained by BiTMapproach on respectively isolet5, SonarMines, MovementLi-bras and LungCancer datasets.

Figures 4(a), 5(a), 6(a) and 7(a) are dedicated to the visual-ization of organized datasets according to cluster in rows andcolumns on respectively isolet5, SonarMines, MovementLibrasand LungCancer datasets. The displays can be obtained withany method of bi-clustering. Observing this last visualization,it is difficult to analyze bi-clusters or blocks. To facilitate thistask, we propose to visualize the bi-clusters using topologicalmap.

Figures 4(b), 5(b), 6(b) and 7(b) show the topologicalorganization of bi-cluster on respectively isolet5, SonarMines,MovementLibras and LungCancer datasets. In BiTM, eachcell of the map represents a cluster of observations andfeatures that form the bi-cluster or block. This organizationis illustrated by different colors. For example, in the top leftof the map 4(b) (The first five rows and columns), the clusterof observations and features have a weak dominant featuresindicated with blue color. The blocks on the right (two lastcolumns and all rows) have an average dominant featuresindicated by yellow color. The down center blocks (four lastrows and columns from five to eight) have a high color (red).

Figures 4(c), 5(c), 6(c) and 7(c) show the cluster member-ship map on respectively isolet5, SonarMines, MovementLi-bras and LungCancer datasets. The cells are represented bysquares that are scaled according to the ratio of observationsthat BiTM have captured.

We can zoom in the inside of a cell in order to analyze theorganization of observations and features. The results obtainedby zooming on the map 4(b) are shown in figure 8. We canclearly observe the obtained cutting. The color is relativelysimilar when the features are correlated. This is due to the useof the neighborhood relationship in BiTM. For example, in thetop right of the figure 8(d), we can visualize an homogeneousblocks represented by red color. This result enables to analystsa better understanding of data’s coherence.

Page 6: [IEEE 2013 International Joint Conference on Neural Networks (IJCNN 2013 - Dallas) - Dallas, TX, USA (2013.08.4-2013.08.9)] The 2013 International Joint Conference on Neural Networks

(a) dataset organized according thecluster of the observations and thefeatures

(b) BiTM Map. bi-cluster organizedin a map according the features andobservation cluster

(c) membership map

Fig. 4. BiTM visualization using isolet5 dataset. Each cell in figure 4(a) and4(b) indicates a cell of the map

(a) dataset organized according thecluster of the observations and thefeatures

(b) BiTM Map. bi-cluster organizedin a map according the features andobservation cluster

(c) membership map

Fig. 5. BiTM visualization using SonarMines dataset. Each cell in figure5(a) and 5(b) indicates a cell of the map

(a) dataset organized according thecluster of the observations and thefeatures

(b) BiTM Map. bi-cluster organizedin a map according the features andobservation cluster

(c) membership map

Fig. 6. BiTM visualization using MovementLibras dataset. Each cell in figure6(a) and 6(b) indicates a cell of the map

(a) dataset organized according thecluster of the observations and thefeatures

(b) BiTM Map. bi-cluster organizedin a map according the features andobservation cluster

(c) membership map

Fig. 7. BiTM visualization using LungCancer dataset. Each cell in figure7(a) and 7(b) indicates a cell of the map

Page 7: [IEEE 2013 International Joint Conference on Neural Networks (IJCNN 2013 - Dallas) - Dallas, TX, USA (2013.08.4-2013.08.9)] The 2013 International Joint Conference on Neural Networks

(a) zooming cell # 12 of isolet5dataset

(b) zooming cell # 31 of isolet5dataset

(c) zooming cell # 48 of isolet5dataset

(d) zooming cell # 60 of isolet5dataset

Fig. 8. Figure obtained by zooming few cells in the BiTM Map presentedin figure 4(b)

IV. CONCLUSION AND PERSPECTIVES

In this paper, we have proposed a new bi-clustering algo-rithm based on topological map model. The proposed approachallows to organize a data matrix into homogeneous blocksby considering simultaneously the sets of rows and columns.The novelty in our approach is to make a partitioning ofthe rows (observations) respecting neighborhood relationshipof the columns (features) and vice versa using a singlemap. This allows to maintain a low computational cost. Aseries of experiments are conducted to validate the proposedmethod. Experimental results demonstrate that our algorithmis promising and identify meaningful bi-clusters. We havealso presented the capabilities of BiTM to provide a novelvisualization of bi-clustering results using a topological map.

There are many perspectives to study. The first one consistsof further improving the performance of the BiTM compu-tation. Secondly, we will adapt BiTM for binary and mixeddata.

REFERENCES

[1] Fabrizio Angiulli, Eugenio Cesario, and Clara Pizzuti. A greedy searchapproach to co-clustering sparse binary matrices. In ICTAI, pages 363–370. IEEE Computer Society, 2006.

[2] K. Benabdeslem and K. Allab. Bi-clustering continuous data with self-organizing map. Neural Computing and Applications, 2012.

[3] Stanislav Busygin, Gerrit Jacobsen, Ewald Krï¿ 12

mer, and ContentsoftAg. Double conjugated clustering applied to leukemia microarray data.In In 2nd SIAM ICDM, Workshop on clustering high dimensional data,2002.

[4] Yizong Cheng and George M. Church. Biclustering of expression data,2000.

[5] Marie Cottrell, Smail Ibbou, and Patrick Letrémy. Som-based algorithmsfor qualitative variables. Neural Netw., 17(8-9):1149–1167, October2004.

[6] M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein. Cluster analysisand display of genome-wide expression patterns, 1998.

[7] A. Frank and A. Asuncion. Uci machine learning repository. Technicalreport, University of California, Irvine, School of Information andComputer Sciences, available at :http ://archive.ics.uci.edu/ml, 2010.

[8] G. Getz, E. Levine, and E. Domany. Coupled two-way clusteringanalysis of gene microarray data. Proc. Natl. Acad. Sci. USA, 97:12079–12084, 2000.

[9] G. Getz, E. Levine, E. Domany, and M. Q. Zhang. Super paramagneticclustering of yeast gene expression profiles, 2000.

[10] G. Govaert. Classification croisée. PhD thesis, Université Paris 6,France, 1983.

[11] G. Govaert and M. Nadif. Block clustering with Bernoulli mixturemodels: Comparison of different approaches. Computational Statisticsand Data Analysis, 52:3233–3245, 2008.

[12] D. Greene and P. Cunningham. Spectral co-clustering for dynamicbipartite graphs. In Workshop on dynamic networks and knowledgediscovery at ecml’10, barcelona, spain.

[13] J. A. Hartigan. Direct Clustering of a Data Matrix. Journal of theAmerican Statistical Association, 67(337):123–129, 1972.

[14] T. Kohonen, M. R. Schroeder, and T. S. Huang, editors. Self-OrganizingMaps. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 3rd edition,2001.

[15] Lazhar Labiod and Mohamed Nadif. Co-clustering under nonnegativematrix tri-factorization. In Proceedings of the 18th international confer-ence on Neural Information Processing - Volume Part II, ICONIP’11,pages 709–717, Berlin, Heidelberg, 2011. Springer-Verlag.

[16] Daniel D. Lee and H. Sebastian Seung. Learning the parts of objectsby nonnegative matrix factorization. Nature, 401:788–791, 1999.

[17] Bo Long, Zhongfei (Mark) Zhang, and Philip S. Yu. Co-clusteringby block value decomposition. In Proceedings of the eleventh ACMSIGKDD international conference on Knowledge discovery in datamining, KDD ’05, pages 635–640, New York, NY, USA, 2005. ACM.

[18] R. Priam, M. Nadif, and G. Govaert. The block generative topographicmapping. In The Third International Workshop on Artificial NeuralNetworks in Pattern Recognition, Lecture Notes in Artificial Intelligence(LNCS), number 5064, pages 13–23, Berlin Heidelberg, September 2008.Springer.

[19] Hanhuai Shan, , and Arindam Banerjee. Residual bayesian co-clusteringfor matrix approximation. In SDM, pages 223–234, 2010.

[20] Fanhua Shang, L. C. Jiao, and Fei Wang. Graph dual regularizationnon-negative matrix factorization for co-clustering. Pattern Recogn.,45(6):2237–2250, June 2012.

[21] Alexander Strehl, Joydeep Ghosh, and Claire Cardie. Cluster ensembles- a knowledge reuse framework for combining multiple partitions.Journal of Machine Learning Research, 3:583–617, 2002.

[22] Amos Tanay, Roded Sharan, and Ron Shamir. Discovering statisticallysignificant biclusters in gene expression data. In In Proceedings of ISMB2002, pages 136–144, 2002.