14
Research Article A Tensor CP Decomposition Method for Clustering Heterogeneous Information Networks via Stochastic Gradient Descent Algorithms Jibing Wu, Zhifei Wang, Yahui Wu, Lihua Liu, Su Deng, and Hongbin Huang Science and Technology on Information System Engineering Laboratory, National University of Defense Technology, Changsha, China Correspondence should be addressed to Hongbin Huang; [email protected] Received 1 December 2016; Revised 14 February 2017; Accepted 16 February 2017; Published 30 April 2017 Academic Editor: Fabrizio Riguzzi Copyright © 2017 Jibing Wu et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Clustering analysis is a basic and essential method for mining heterogeneous information networks, which consist of multiple types of objects and rich semantic relations among different object types. Heterogeneous information networks are ubiquitous in the real-world applications, such as bibliographic networks and social media networks. Unfortunately, most existing approaches, such as spectral clustering, are designed to analyze homogeneous information networks, which are composed of only one type of objects and links. Some recent studies focused on heterogeneous information networks and yielded some research fruits, such as RankClus and NetClus. However, they oſten assumed that the heterogeneous information networks usually follow some simple schemas, such as bityped network schema or star network schema. To overcome the above limitations, we model the heterogeneous information network as a tensor without the restriction of network schema. en, a tensor CP decomposition method is adapted to formulate the clustering problem in heterogeneous information networks. Further, we develop two stochastic gradient descent algorithms, namely, SGDClus and SOSClus, which lead to effective clustering multityped objects simultaneously. e experimental results on both synthetic datasets and real-world dataset have demonstrated that our proposed clustering framework can model heterogeneous information networks efficiently and outperform state-of-the-art clustering methods. 1. Introduction Heterogeneous information networks are ubiquitous in the real-world applications. Generally, heterogeneous informa- tion networks consist of multiple types of objects and rich semantic relations among different object types. e bibliographic network extracted from the DBLP database (http://www.informatik.uni-trier.de/ley/db/) is a typical heterogeneous information network, as shown in Figure 1. e DBLP database is an open resource that contains most bibliographic information on computer science. e biblio- graphic network contains four types of objects: author (A), paper (P), venue (i.e., conference or journal) (V), and term (T). e edges are labeled by “write” or “written by” between author and paper or labeled by “publish” or “published by” between paper and venue or labeled by “contain” or “contained in” between paper and term. Clustering analysis is a basic and essential method for mining such networks, which can help us better understand the semantic information and interpretable structure in the network. Unfortunately, most existing approaches, such as spectral clustering, are designed to analyze homogeneous information networks [1] that consist of only a single type of objects and links, while the real-world situations are oſten heterogeneous information networks [2] in nature with more than one type of objects and links. e mission of clustering such a heterogeneous information network is more difficult than that in a homogeneous information network, as we cannot directly measure the similarity among the different types of objects and relations. ough some recent studies have focused on clustering heterogeneous information networks, such as RankClus [1] and NetClus [2], they can only be applied to some specific simple network schemas. RankClus can only be used to model bityped networks, where only two different types of objects exist in the network. NetClus was developed for the star network schema, where the links only appear between Hindawi Scientific Programming Volume 2017, Article ID 2803091, 13 pages https://doi.org/10.1155/2017/2803091

A Tensor CP Decomposition Method for Clustering ...downloads.hindawi.com/journals/sp/2017/2803091.pdf · spectral clustering, are designed to analyze homogeneous informationnetworks

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Tensor CP Decomposition Method for Clustering ...downloads.hindawi.com/journals/sp/2017/2803091.pdf · spectral clustering, are designed to analyze homogeneous informationnetworks

Research ArticleA Tensor CP Decomposition Method for ClusteringHeterogeneous Information Networks via Stochastic GradientDescent Algorithms

JibingWu Zhifei Wang Yahui Wu Lihua Liu Su Deng and Hongbin Huang

Science and Technology on Information System Engineering Laboratory National University of Defense Technology Changsha China

Correspondence should be addressed to Hongbin Huang hbhuangnudteducn

Received 1 December 2016 Revised 14 February 2017 Accepted 16 February 2017 Published 30 April 2017

Academic Editor Fabrizio Riguzzi

Copyright copy 2017 Jibing Wu et alThis is an open access article distributed under theCreativeCommonsAttribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Clustering analysis is a basic and essential method for mining heterogeneous information networks which consist of multipletypes of objects and rich semantic relations among different object types Heterogeneous information networks are ubiquitous inthe real-world applications such as bibliographic networks and social media networks Unfortunately most existing approachessuch as spectral clustering are designed to analyze homogeneous information networks which are composed of only one typeof objects and links Some recent studies focused on heterogeneous information networks and yielded some research fruits suchas RankClus and NetClus However they often assumed that the heterogeneous information networks usually follow some simpleschemas such as bityped network schema or star network schema To overcome the above limitations wemodel the heterogeneousinformation network as a tensor without the restriction of network schema Then a tensor CP decomposition method is adaptedto formulate the clustering problem in heterogeneous information networks Further we develop two stochastic gradient descentalgorithms namely SGDClus and SOSClus which lead to effective clustering multityped objects simultaneouslyThe experimentalresults on both synthetic datasets and real-world dataset have demonstrated that our proposed clustering framework can modelheterogeneous information networks efficiently and outperform state-of-the-art clustering methods

1 Introduction

Heterogeneous information networks are ubiquitous in thereal-world applications Generally heterogeneous informa-tion networks consist of multiple types of objects andrich semantic relations among different object types Thebibliographic network extracted from the DBLP database(httpwwwinformatikuni-trierdesimleydb) is a typicalheterogeneous information network as shown in Figure 1The DBLP database is an open resource that contains mostbibliographic information on computer science The biblio-graphic network contains four types of objects author (A)paper (P) venue (ie conference or journal) (V) and term(T) The edges are labeled by ldquowriterdquo or ldquowritten byrdquo betweenauthor and paper or labeled by ldquopublishrdquo or ldquopublishedbyrdquo between paper and venue or labeled by ldquocontainrdquo orldquocontained inrdquo between paper and term

Clustering analysis is a basic and essential method formining such networks which can help us better understand

the semantic information and interpretable structure in thenetwork Unfortunately most existing approaches such asspectral clustering are designed to analyze homogeneousinformation networks [1] that consist of only a single type ofobjects and links while the real-world situations are oftenheterogeneous information networks [2] in nature with morethan one type of objects and links The mission of clusteringsuch a heterogeneous information network is more difficultthan that in a homogeneous information network as wecannot directly measure the similarity among the differenttypes of objects and relations

Though some recent studies have focused on clusteringheterogeneous information networks such as RankClus [1]and NetClus [2] they can only be applied to some specificsimple network schemas RankClus can only be used tomodel bityped networks where only two different types ofobjects exist in the network NetClus was developed for thestar network schema where the links only appear between

HindawiScientific ProgrammingVolume 2017 Article ID 2803091 13 pageshttpsdoiorg10115520172803091

2 Scientific Programming

Paper

Author

Venue Term

writ

ten

by

writ

e

published

by

contain

contained in

publish

Figure 1 An example of heterogeneous information network abibliographic network extracted from the DBLP database

target objects and attribute objects The network schemais a metatemplate of a heterogeneous information networkwhich shows how many types of objects and links are therein the network [3] Figure 1 shows a typical star networkschema where the paper (P) is the target object and othersare attribute objects

A tensor is a generalization of the matrix in the high-dimensional space It is a natural expression of complicatedand interpretable structures in high-mode data In this paperwe model a heterogeneous information network as a tensorwithout the restriction of network schema Each type ofobjects maps onto one mode of the tensor and the semanticrelations among different object types map onto the elementsin the tensor Then a tensor CP decomposition method isadapted to formulate the clustering problem in heteroge-neous information networks And two stochastic gradientdescent algorithms are developed which lead to effectiveclustering multityped objects simultaneously The experi-mental results on both synthetic datasets and real-worlddataset show that the proposed clustering framework canmodel the heterogeneous information networks efficientlyand outperform the state-of-the-art clustering methods

The rest of this paper is organized as follows In Section 2we discuss the related work on clustering for heterogeneousinformation networks and the tensor factorization Section 3gives some notations and definitions used in this paper InSection 4 we formulate the clustering problem and describetwo stochastic gradient descent algorithmsThe experimentalresults on both synthetic datasets and real-world dataset arepresented in Section 5 Finally the conclusions are drawn inSection 6

2 Related Work

Our work mainly focuses on the clustering heterogeneousinformation networks and the tensor factorization

21 Clustering Heterogeneous Information Networks Cluster-ing is an unsupervised learning method to recognize thedistribution and hidden structures in the data which is abasic and significant mission for pattern recognition andmachine learning Since MacQueen first proposed K-means[4] in 1967 many subtle algorithms have been developedfor clustering in the past decades However most existingalgorithms such as hierarchical clustering algorithm [5]density-based clustering [6] mesh-based clustering [7] fuzzyclustering algorithm [8] and spectral clustering [9] aredesigned to analyze point sets or homogeneous informationnetworks which are composed of only one type of objects andlinks

In real-world applications the datasets are often orga-nized as heterogeneous information networks where objectsand the relations between them are of more than one type Inrecent years researchers have made a significant progress onclustering for heterogeneous information networks [10 11]which largely focused on four main directions the first is touse a ranking based clustering algorithm [1] it developed theRankClus algorithm that integrated clustering with rankingfor clustering bityped networks Its extension NetClus [2]was developed for the star network schemaThey have proventhat ranking and clustering can mutually enhance each otherGPNRankClus [12] assumed that edges in heterogeneousinformation networks follow a Poisson distribution Thismethod can simultaneously achieve both clustering andranking in a heterogeneous informationnetwork In additionFctClus [13] achieved a higher computational speed andhad a greater clustering accuracy when applied to heteroge-neous information networks But same as NetClus FctClusalgorithm can only handle the star network schema For ageneral network schema HeProjI [14] projected the networkinto a number of bityped or star schema subnetworks andperformed the ranking based clustering in each subnetwork

The second direction involves metapath based clusteringalgorithms [15 16] Ametapath is a connected path defined onthe network schema of a heterogeneous information networkwhich represents a composite semantic relation between twoobjects PathSim (metapath based top-k similarity search) [3]measured the similarity between the same types of objectsbased on metapath in heterogeneous information networksHowever it has a limitation themetapathmust be symmetricthat is PathSim could not work on different types of objectsThe PathSelClus algorithm in [15ndash17] integrated metapathselection with user guidance to cluster objects in networkswhere user provided seeds for each cluster acted as guidance

The third direction is structural-based clustering Sun etal proposed a probabilistic clustering method [18] to dealwith heterogeneous information networks with incompleteattributes which integrated the incomplete attribute infor-mation and the network structure information NetSim [19]is a structural-based similarity measurement between objectsfor x-star network Xu proposed a Bayesian probabilisticmodel based onnetwork structural information for clusteringheterogeneous information networks

The final direction is a clustering algorithm based onsocial network features Zhou and Liu designed the SI-Clusteralgorithm [20] which adopted the heat diffusion procedure

Scientific Programming 3

to model the social influence and thenmeasure the similaritybetween objects

22 Tensor Factorization A tensor is a multidimensionalarray in which the elements are addressed by more thantwo indices Tensor factorization has been studied since theearly 20th century [21ndash25] Two of the most popular tensorfactorization methods are Tucker decomposition [21 24] andcanonical decomposition using parallel factors (CANDE-COMPPARAFAC) [23 24] The CANDECOMPPARAFACis also named CP decomposition It is worth noting that CPdecomposition is a special case of Tucker decomposition

The clustering issues based on tensor factorization areoften modeled as the optimization problems [26] But it hasbeen proven in [27] that tensor clustering formulations areNP-hard In the past years many approximation algorithmsfor tensor clustering are proposed [28ndash30] These theoriesprovide a new perspective for us as to clustering heteroge-neous information networks Tensor factorization based clus-tering has also been used in some specific applications Exam-ples include link prediction in higher-order network struc-tures [31 32] collaborative filtering in recommendation sys-tems [33] community detection in multigraphs [34] graphclustering [35] and modeling multisource datasets [36]

The Alternating Least Squares (ALS) algorithm [22 3738] is one of themost famous and commonly used algorithmsto solve the tensor factorization which updates one compo-nent iteratively at each round while holding the others con-stant However ALS suffers from some limitations for exam-ple ALS may converge to a local minimum and the memoryconsumption may explode when the scale of tensor is largeNonlinear optimization approach is another option to obtainthe tensor factorization such as nonlinear conjugate gradientmethod [39] Newton based optimization [40] randomizedblock sampling method [41] and stochastic gradient descent[42] In this paper we adopt a stochastic gradient descentalgorithmwith Tikhonov regularization item loss function toprocess the tensor CP decomposition based clustering

3 Preliminaries

First we introduce some related concepts and notations oftensors used in this paper More details about tensor algebracan be found in [24 43] The order of a tensor is the numberof dimensions also known as ways or modes We will followthe convention used in [23] to denote scalars by lowercaseletters for example 119886 119887 119888 vectors (one mode) by boldfacelowercase letters for example a b c matrices (two modes)by boldface capital letters for example ABC and tensors(three modes or more) by boldface calligraphic letters forexample XYZ Elements of a matrix or a tensor aredenoted by lowercase letters with subscripts that is the(1198941 1198942 119894119873)th element of an119873th-order tensorX is denotedby 1199091198941 1198942 119894119873 The notations about tensor algebra used in thispaper are summarized in Notations

If an119873th-order tensorX isin R1198681times1198682timessdotsdotsdottimes119868119873 can be written asthe outer product of119873 vectors that isX = a(1) ∘a(2) ∘sdot sdot sdot∘a(119873)and a(119899) isin R119868119899 119899 = 1 2 119873 tensor X is named rank-onetensor The CP decomposition represents a tensor as a sum

of 119877 rank-one tensors that is the CP decomposition ofX isX = sum119877119903=1 a(1)119903 ∘ a(2)119903 ∘ sdot sdot sdot ∘ a(119873)119903 where 119877 is a positive integerand a(119899)119903 isin R119868119899 119903 = 1 2 119877 119899 = 1 2 119873 The rank of atensor is defined as the smallest number of rank-one tensorsfor which the equality holds in the CP decomposition anddenoted as rank(X) = min119877 In fact the problem of tensorrank determination is NP-hard [27]

Let factor matrices A(119899) = [a(119899)1 a(119899)2 a(119899)119877 ] isin R119868119899times119877for 119899 = 1 2 119873 We denote ⟦A(1)A(2) A(119873)⟧ equivsum119877119903=1 a(1)119903 ∘ a(2)119903 ∘ sdot sdot sdot ∘ a(119873)119903 By minimizing the Frobeniusnorm of the difference betweenX and its CP approximationthe CP decomposition can be formulated as an optimizationproblem

min 12 10038171003817100381710038171003817X minus ⟦A(1)A(2) A(119873)⟧100381710038171003817100381710038172119865 (1)

Then we give the definitions for the information networkand the network schema which are based on the work by Sunet al [3]

Definition 1 (information network [3]) An information net-work is a graph 119866 = (119881 119864) defined on a set of objects 119881 and aset of links 119864 where119881 belongs to 119879 objects types V = V119905119879119905=1and 119864 belongs to 119871 links types E = R119897119871119897=1

Specifically when 119879 gt 1 or 119871 gt 1 the information net-work is called heterogeneous information network otherwiseit is called homogeneous information network

We denote the object set of type V119905 as V119905119899119873119905119899=1 where 119873119905is the number of objects in type V119905 that is 119873119905 = |V119905| and119905 = 1 2 119879 The total number of objects in the network isgiven by119873 = sum119879119905=1119873119905Definition 2 (network schema [3]) The network schema is ametatemplate for a heterogeneous information network 119866 =(119881 119864) which is a graph defined over object types V and linkstypes E denoted by 119878119866 = V E

A network schema 119878119866 = V E shows how many types ofobjects are there in the network119866 = (119881 119864) andwhich type thelinks between different object types belong to Figure 1 showsthe network schema of DBLP which follows a star networkschema

4 Tensor CP Decomposition BasedClustering Framework

41 Tensor Construction According to Definition 2 we knowthat the network schema 119878119866 = V E is a metatemplate forthe given heterogeneous information network 119866 = (119881 119864) Inother words 119866 = (119881 119864) is an instance of 119878119866 = V E So wecan find at least one subnetwork of 119866 = (119881 119864) which followsthe schema 119878119866 = V EDefinition 3 (gene-network) A gene-network denoted by1198661015840 = (1198811015840 1198641015840) is the minimum subnetwork of 119866 = (119881 119864)which follows the schema 119878119866 = V E

It is easy to see that a gene-network is one of thesmallest instances of 119878119866 = V E in the set of subnet-works of 119866 = (119881 119864) For example a gene-network in

4 Scientific Programming

+

21

3

4

1

4

2

3

Cluster 1

1 2 3 412

34

12

34

5

Cluster 2 Cluster K

1

2

5

4

3 + middot middot middot +

Figure 2 An illustration of tensor CP decomposition method for clustering heterogeneous information network

DBLP network (in Figure 1 which contains four types ofobjects 119860 119875 119881 119879) denoted by 1198661015840 = (V119860119894 V119875119895 V119881119898 V119879119899 ⟨V119860119894 V119875119895 ⟩ ⟨V119875119895 V119881119898⟩⟨V119875119895 V119879119899 ⟩) represents a semantic relation ofldquoan Author V119860119894 writes a Paper V119875119895 published in the Venue V119881119898and containing the Term V119879119899 rdquo For simplicity we can use thesubscript of each object in 1198661015840 to mark the correspondinggene-network In the example the gene-network 1198661015840 can bemarked by 1198661015840119894119895119898119899

LetX be a119879th-order tensor of size1198731times1198732timessdot sdot sdottimes119873119879 eachmode of X represents one type of objects in the network 119866An arbitrary element 11990911989911198992sdotsdotsdot119899119879 isin 0 1 for 119899119905 = 1 2 119873119905is an indicator of whether the corresponding gene-network11986610158401198991 1198992 119899119879 exists that is

11990911989911198992sdotsdotsdot119899119879 = 1 if exist11986610158401198991 1198992119899119879 0 otherwise

(2)

Then the heterogeneous information network119866 = (119881 119864)can be represented by the form of tensor asX

42 Problem Formulation Using the tensor representationX of 119866 = (119881 119864) we can partition the multityped objectsinto different clusters by the CP decomposition We assumethat there are 119870 clusters in 119866 = (119881 119864) and denote U(119905) isinR119873119905times119870 119905 = 1 2 119879 as the cluster indication matrix of the119905th type of objects Then the CP decomposition ofX is

min 10038171003817100381710038171003817X minus ⟦U(1)U(2) U(119879)⟧100381710038171003817100381710038172119865 (3)

Each row u(119905)119896 = [119906(119905)1198961 119906(119905)1198962 119906(119905)119896119873119905]⊤ of the factor matrixU(119905) is the probability vector for each object from 119905th typebelonging to the 119896th cluster In other words the 119896th clusteris composed of the 119896th rank-one tensor in the CP decompo-sition that is u(1)119896 ∘ u(2)119896 ∘ sdot sdot sdot ∘ u(119879)119896

Figure 2 gives an example of tensor CP decompositionmethod for clustering heterogeneous information networkThe left one is the original networkwith three types of objectsthemiddle cube is a 3-mode tensor and the right one is theCPdecomposition of the 3-mode tensor and also is the partitionof the original network In addition the three types of objectsare the yellow round the blue square and the red triangle

respectively The number within each object is the identifierof the object Each element (black dot in the middle cube) inthe tensor represents a gene-network in the network (blackdashed circle in the left) Each component (black dashedcircle in the right) in the CP decomposition shows one clusterof the original network

The problem in (3) is an NP-hard nonconvex optimiza-tion problem which has a continuous manifold of equivalentsolutions [39] In other words the global minimum isdrowned in many local minima which makes it difficult tobe found In real-world scenarios the objects in the het-erogeneous information networks may belong to more thanone cluster that is the clusters are overlapping Howeverthe number of clusters that the vast majority of objects maybelong to is much smaller than the total number of clustersThat is most of the elements in U(119905) should be zero that isU(119905) should be sparse To overcome the two challenges wecan introduce a Tikhonov regularization term proposed byPaatero in [44 45] in the objective function and replace theobjective function by the following loss function

L (XU(1)U(2) U(119879))= 12 10038171003817100381710038171003817X minus ⟦U(1)U(2) U(119879)⟧100381710038171003817100381710038172119865 + 1205822 119879sum

119905=1

10038171003817100381710038171003817U(119905)100381710038171003817100381710038172119865 (4)

where 120582 gt 0 is a regularization parameter Let

119891 (XU(1)U(2) U(119879))= 12 10038171003817100381710038171003817X minus ⟦U(1)U(2) U(119879)⟧100381710038171003817100381710038172119865 (5)

be the first squared loss function component inL and let

119892 (U(1)U(2) U(119879)) = 1205822 119879sum119905=1

10038171003817100381710038171003817U(119905)100381710038171003817100381710038172119865 (6)

be the Tikhonov regularization term inL respectivelyThen

L = 119891 + 119892 (7)

The Tikhonov regularization term 119892 in the loss functionL has an encouraging property which makes the Frobenius

Scientific Programming 5

norms of all factor matrices in the optimization be equal thatis 10038171003817100381710038171003817U(1)10038171003817100381710038171003817119865 = 10038171003817100381710038171003817U(2)10038171003817100381710038171003817119865 = sdot sdot sdot = 10038171003817100381710038171003817U(119879)10038171003817100381710038171003817119865 (8)

Therefore the local minima of loss function L becomeisolated and any replacement and scaling of the satisfactorysolutions will escape from the optimization The detailsof proof can be found in [39] Meanwhile the Tikhonovregularization term can ensure the sparsity of the factormatrices by penalizing the number of nonzero elements

Therefore the tensor CP decomposition method forclustering heterogeneous information networks can be for-malized as

minU(1) U(2) U(119879)

L (XU(1)U(2) U(119879))st 119870sum

119896=1

119906(119905)119899119896 = 1 forall119905 forall119899119906(119905)119899119896 isin [0 1] forall119905 forall119899 forall119896119873119905sum119899=1

119906(119905)119899119896 gt 0 forall119905 forall119896(9)

where 119899 = 1 2 119873119905 119905 = 1 2 119879 119896 = 1 2 119870and 119870 lt min 1198731 1198732 119873119879 is the total number ofclusters In (9) we divide X into 119870 clusters and obtain thestructure of each cluster which includes the distribution ofeach objectThe first constraint in (9) guarantees that the sumof probabilities for each object belonging to all clusters is 1The second constraint in (9) represents that each probabilityshould be in the range of [0 1] The last constraint in (9)makes sure that there is no empty cluster for any mode

43 The Stochastic Gradient Descent Algorithms Stochasticgradient descent is a mature and widely used tool foroptimizing various models in machine learning such as arti-ficial neural networks support vector machines and logisticregression In this section the regularized clustering problemin (9) will be addressed by the stochastic gradient descentalgorithms The details of tensor algebra and properties usedin this section can be found in [43]

First we review the stochastic gradient descent algorithmTo solve an optimization problem min119909119876(119909) where 119876(119909) isa differentiable object function to be minimized and 119909 is avariable the stochastic gradient descent method to update 119909can be described as

119909 larr997888 119909 minus 120578nabla119876 (119909) (10)

where 120578 is a positive number named learning rate or stepsize The convergence speed of stochastic gradient descentalgorithm depends on the choice of learning rate 120578 and initialsolution

Though the stochastic gradient descent algorithm mayconverge to a local minimum at a linear speed the efficiencyof the algorithmnear the optimal point is not all roses [46] To

speed up the final optimization phase an extension methodnamed second-order stochastic algorithm is designed in [46]which replaces the learning rate 120578 by the inverse of second-order derivative of the object function that is

119909 larr997888 119909 minus 120578 (nabla2119876 (119909))minus1 nabla119876 (119909) (11)

Now we apply the stochastic gradient descent and thesecond-order stochastic algorithm to the clustering prob-lem in (9) and propose two algorithms named SGDClus(Stochastic Gradient Descent for Clustering) and SOSClus(Second-Order Stochastic for Clustering) respectively

431 SGDClus In SGDClus we apply the stochastic gradientdescent algorithm to the clustering problem in (9) Accordingto (10) each factor matrix U(119905) for 119905 = 1 2 119879 is updatedby the rule

U(119905) larr997888 U(119905) minus 120578 120597L120597U(119905) = U(119905) minus 120578( 120597119891120597U(119905) + 120597119892120597U(119905)) (12)

Actually 120597119892120597U(119905) is easy to be obtained according to (6)that is 120597119892120597U(119905) = 120582U(119905) (13)

To compute 120597119891120597U(119905) 119891(XU(1)U(2) U(119879)) can berewritten by matricization ofX along the 119905th mode as

119891(119905) = 12 100381710038171003817100381710038171003817X(119905) minus U(119905) (⊙(119905)U)⊤1003817100381710038171003817100381710038172119865 (14)

where ⊙(119905)U = U(119879) ⊙ sdot sdot sdot ⊙ U(119905+1) ⊙ U(119905minus1) ⊙ sdot sdot sdot ⊙ U(1) Thenwe have120597119891(119905)120597U(119905)

= 120597119879119903 ((X(119905) minus U(119905) (⊙(119905)U)⊤) (X(119905) minus U(119905) (⊙(119905)U)⊤)⊤)2120597U(119905)= 120597119879119903 (X(119905)X

⊤(119905))2120597U(119905) minus 120597119879119903 (2X(119905) (⊙(119905)U) (U(119905))⊤)2120597U(119905)

+ 120597119879119903 ((U(119905) (⊙(119905)U)⊤) (U(119905) (⊙(119905)U)⊤)⊤)2120597U(119905)= minusX(119905) (⊙(119905)U) + U(119905) (⊙(119905)U)⊤ (⊙(119905)U)

(15)

Another way to compute 120597119891(119905)120597U(119905) is given in [39] whichhas the same result as (15) Following the works in [39] wedenote

Γ(119905) = (⊙(119905)U)⊤ (⊙(119905)U)= ((U(1))⊤U(1)) lowast sdot sdot sdot lowast ((U(119905minus1))⊤U(119905minus1))

lowast ((U(119905+1))⊤U(119905+1)) lowast sdot sdot sdot lowast ((U(119879))⊤U(119879)) (16)

6 Scientific Programming

Input X 119870 120582Output U(119905)119879119905=1(1) Initialize U(119905)119879119905=1(2) Set iter larr 1(3) repeat(4) for 119905 larr 1 to119879 do(5) Set 120578(6) Update U(119905) according to (18)(7) Normalize U(119905) according to (19)(8) end for(9) Set iter larr iter + 1(10) until L(XU(1)U(2) U(119879)) unchanged or iter

reaches the maximum iterations

Algorithm 1 The SGDClus algorithm

Therefore the partial derivative ofL is given by

120597L120597U(119905) = minusX(119905) (⊙(119905)U) + U(119905)Γ(119905) + 120582U(119905) (17)

And (12) can be rewritten as

U(119905) larr997888 U(119905) minus 120578 120597L120597U(119905)= U(119905) (I minus 120578 (Γ(119905) + 120582I)) + 120578X(119905) (⊙(119905)U) (18)

where I is an identity matrix Note that U(119905)119879119905=1 derived by(18) do not satisfy the first and second constraints in (9) Tosatisfy these two constraints we can normalize each row ofU(119905)119879119905=1 by

119906(119905)119899119896 larr997888 119906(119905)119899119896sum119870119896=1 119906(119905)119899119896 (19)

Furthermore the pseudocode of SGDClus is given inAlgorithm 1

432 SOSClus In SOSClus we apply the second-orderstochastic algorithm to the clustering problem in (9) Accord-ing to (11) each factor matrix U(119905) for 119905 = 1 2 119879 isupdated by the rule

U(119905) larr997888 U(119905) minus 120578( 1205972L1205972U(119905))minus1 120597L120597U(119905) (20)

According to (17) we can obtain the second-order partialderivative ofL with respect to U(119905) that is

1205972L1205972U(119905) = Γ(119905) + 120582I (21)

By substituting (17) and (21) into (20) we have

U(119905) larr997888 (1 minus 120578)U(119905) + 120578X(119905) (⊙(119905)U) (Γ(119905) + 120582I)minus1 (22)

Input X 119870 120582Output U(119905)119879119905=1(1) Initialize U(119905)119879119905=1(2) Set iter larr 1(3) repeat(4) for 119905 larr 1 to 119879 do(5) Set 120578(6) Compute U(119905)opt according to (23)(7) Update U(119905) according to (24)(8) Normalize U(119905) according to (19)(9) end for(10) Set iter larr iter + 1(11) until L(XU(1)U(2) U(119879)) unchanged or iter

reaches the maximum iterations

Algorithm 2 The SOSClus algorithm

Note thatX(119905)(⊙(119905)U)(Γ(119905)+120582I)minus1 is theGeneralGradient-based Optimization (OPT) [39] solution for updating U(119905) inthe regularized CP decomposition by making (17) equal tozero See the details of proof in [39] Let

U(119905)opt = X(119905) (⊙(119905)U) (Γ(119905) + 120582I)minus1 (23)

Therefore the updating rule of U(119905) in SOSClus is a weightedaverage of the current solution and the OPT solution intu-itively that is

U(119905) larr997888 (1 minus 120578)U(119905) + 120578U(119905)opt (24)

Actually SOSClus is a general extension of GeneralGradient-based Optimization (OPT) [39] and the ALS withstep size restriction in randomized block sampling method[41] In (24) when the learning rate 120578 = 1 we get the OPTsolution When the regularization parameter 120582 = 0 SOSClusbecomes the ALS with step size restriction in randomizedblock sampling method

Similar to SGDClus U(119905)119879119905=1 derived by (24) in SOSClusalso do not satisfy the first and second constraints in (9) Weshould normalize each row of U(119905)119879119905=1 according to (19) Thepseudocode of SOSClus is given in Algorithm 2

44 Feasibility Analysis

Theorem 4 TheCP decomposition ofX obtains the clusteringofmultityped objects in the heterogeneous information network119866 = (119881 119864) simultaneously

Proof Since the proofs for different types of objects in theheterogeneous information network 119866 = (119881 119864) are similarwewill simply describe the process for a single type of objectsWithout loss of generality we detail the proof on the 119905th typeof objects

Given a heterogeneous information network 119866 = (119881 119864)and its tensor representation X the nonzero elements inX represent the input gene-networks in 119866 = (119881 119864) which

Scientific Programming 7

we want to partition into 119870 clusters C1C2 C119870 Thecentre of the cluster C119896 is denoted by c119896 By using thecoordinate format [47] as the sparse representation ofX thegene-networks can be denoted as a matrix M isin R119899119899119911(X)times119879where 119899119899119911(X) is the number of nonzero elements inX Eachrow m119899 isin M 119899 = 1 2 119899119899119911(X) gives the subscripts ofcorresponding nonzero element in X In other words m119899

represents a gene-network and the entries 119898119899119905 isin m119899 119905 =1 2 119879 are the subscripts of the objects contained in thegene-network

The traditional clustering approach such as K-meansminimizes the sum of differences between individual gene-network in each cluster and the corresponding cluster centresthat is

min119901119899119896c119896

119899119899119911(X)sum119899=1

1003817100381710038171003817100381710038171003817100381710038171003817m119899 minus 119870sum119896=1

119901119899119896c11989610038171003817100381710038171003817100381710038171003817100381710038172

119865

(25)

where 119901119899119896 is the probability of gene-network m119899 belongingto the 119896th cluster Also we can rewrite the problem by anew perspective of clustering individual object in the gene-network as follows

min119901119899119905119896119888119896119905

119899119899119911(X)sum119899=1

119879sum119905=1

1003817100381710038171003817100381710038171003817100381710038171003817119898119899119905 minus119870sum119896=1

119901119899119905 11989611988811989611990510038171003817100381710038171003817100381710038171003817100381710038172

119865

(26)

where 119901119899119905 119896 is the probability of object V119905119899 belonging to the 119896thcluster

In the matrix form K-means can be formalized as

minPC

M minus PC2119865 (27)

where P is the cluster indication matrix and C is the clustercentres

By matricization of X along the 119905th mode the CPdecomposition ofX in (3) can be rewritten as

minU(1) U(2) U(119879)

100381710038171003817100381710038171003817X(119905) minus U(119905) (⊙(119905)U)⊤1003817100381710038171003817100381710038172119865 (28)

X(119905) is thematricization ofX along the 119905thmode andM is thesparse representation ofX LetU(119905) = P and let (⊙(119905)U)⊤ = CThat is U(119905) is the cluster indication matrix for the 119905th typeof objects and (⊙(119905)U)⊤ is the cluster centres So the CPdecomposition in (3) is equivalent to the K-means clusteringfor the 119905th type of objects in heterogeneous informationnetwork 119866 = (119881 119864)

By matricization of X in (3) along different modes wecan prove that the CP decomposition is equivalent to theK-means clustering for other types of objects So the CPdecomposition of X obtains the clustering of multitypedobjects in the heterogeneous information network119866 = (119881 119864)simultaneously

It is worth noting that the CP decomposition basedclustering is a soft clustering method the factor matricesindicate the probability of objects belonging to correspondingclusters The soft clustering is more in line with reality

because many objects in the heterogeneous informationnetworks may belong to several clusters In other wordsthe clusters are overlapping In some cases the overlappingclusters need to be translated into nonoverlapping clusterswhich can be achieved by using different approaches such asK-means to cluster the rows of factor matrices Usually thenonoverlapping clusters can be obtained by simply assigningeach object to the cluster which has the largest entry in thecorresponding row of factor matrix

45 Time Complexity Analysis Themain time consumptionof updating each factor matrix U(119905) in SGDClus is com-puting 120597L120597U(119905) According to (17) we need to calculateX(119905)(⊙(119905)U) and U(119905)Γ(119905) respectively Since U(119905) isin R119873119905times119870

and X(119905) isin R119873119905timesprod

119879

119894=1119894 =119905

119873119894

we have X(119905)(⊙(119905)U) isin R119873119905times119870Γ(119905) isin R119870times119870 and U(119905)Γ(119905)isinR119873119905times119870 Firstly if we successively calculate the Khatri-Rao prod-

uct of 119879 minus 1 matrices and a matrix-matrix multiplicationwhen computing X(119905)(⊙(119905)U) the intermediate results willbe of very large size and the computational cost will bevery expensive In practice we can reduce the complexityby ignoring the unnecessary calculation Let us observe theelement ofX(119905)(⊙(119905)U) that is

(X(119905) (⊙(119905)U))119899119905 119896

= sum119899119894119879

119894=1119894 =119905

(119909119899119905 prod119879119894=1119894 =119905

119899119894

119879prod119894=1119894 =119905

119906(119894)119899119894 119896) (29)

119909119899119905 prod119879119894=1119894 =119905

119899119894is an element in the matricization ofX which rep-

resents a corresponding gene-network in the heterogeneousinformation networkWhen 119909119899119905 prod119879119894=1

119894 =119905

119899119894= 0 we can ignore the

following Khatri-Rao product Hence only nonzero elementsinX need to be computedTherefore the time complexity forcomputingX(119905)(⊙(119905)U) is 119874(119899119899119911(X)119873119905119870)

Secondly the element of U(119905)Γ(119905) is given by

(U(119905)Γ(119905))119899119905 119896

= 119870sum119895=1

(119906(119905)119899119905 119895 119879prod119894=1119894 =119905

119873119894sum119899119894=1

119906(119894)119899119894 119895119906(119894)119899119894 119896) (30)

So the time complexity of computing U(119905)Γ(119905) is 119874((119873 minus119873119905)1198702) where 119873 = sum119879119905=1119873119905 is the total number of objectsin the networks

Above all the time complexity of each iteration inSGDClus is 119874(119899119899119911(X)119873119870 + (119879 minus 1)1198731198702) Note that in thereal-world heterogeneous information networks the numberof clusters119870 and the number of object types 119879 are usually farless than119873 that is119870 ≪ 119873 and 119879 ≪ 119873

According to (22) the time complexity of updatingeach factor matrix U(119905) in SOSClus is composed of threecomponents X(119905)(⊙(119905)U) (Γ(119905) + 120582I)minus1 and the product ofthem Compared to SGDClus only the time consumption ofcomputing an inverse matrix for (Γ(119905) + 120582I) is additional in

8 Scientific Programming

SOSClus Since (Γ(119905)+120582I) is a119870times119870 squarematrix computingthe inverse of such matrix costs 119874(1198703) Nevertheless 119874(1198703)is usually negligible because119870 ≪ 119873

5 Experiments and Results

In this section we present several experiments on syntheticand real-world datasets for heterogeneous information net-works and compare the performance with a number of state-of-the-art clustering methods

51 Evaluation Metrics and Experimental Setting

511 Evaluation Metrics In the experiments we adopt theNormalized Mutual Information (NMI) [48] and Accuracy(AC) as our performance measurements

NMI is used to measure the mutual dependence infor-mation between the clustering result and the ground truthGiven 119873 objects 119870 clusters one clustering result andthe ground truth classes for the objects let 119899(119894 119895) 119894 119895 =1 2 119870 be the number of objects that labeled 119894 in clus-tering result but labeled 119895 in the ground truth The jointdistribution can be defined as119901(119894 119895) = 119899(119894 119895)119873 themarginaldistribution of rows can be calculated as 1199011(119895) = sum119870119894=1 119901(119894 119895)and themarginal distribution of columns can be calculated as1199012(119894) = sum119870119895=1 119901(119894 119895) Then the NMI is defined as

NMI = sum119870119894=1sum119870119895=1 119901 (119894 119895) log (119901 (119894 119895) 1199011 (119895) 1199012 (119894))radicsum119870119895=1 1199011 (119895) log1199011 (119895)sum119870119894=1 1199012 (119894) log1199012 (119894) (31)

The NMI ranges from 0 to 1 the larger value of NMI thebetter the clustering result

AC is used to compute the clustering accuracy thatmeasures the percent of the correct clustering result AC isdefined as

AC = sum119879119905=1sum119873119905119899=1 120575 (map (120592119905119899) label (120592119905119899))sum119879119905=1119873119905 (32)

where map(120592119905119899) is the cluster label of the object 120592119905119899 and thelabel(120592119905119899) is the ground truth class of the object 120592119905119899 And 120575(sdot)is an indicator function

120575 (sdot) = 1 if map (120592119905119899) = label (120592119905119899) 0 if map (120592119905119899) = label (120592119905119899) (33)

Since both ofNMI andACare used tomeasure the perfor-mance of clustering one type of object the weighted averageNMI and AC are also used to measure the performance ofSTFClus and other state-of-the-art methods

NMI = sum119879119905=1119873119905 (NMI)119905sum119879119905=1119873119905 AC = sum119879119905=1119873119905 (AC)119905sum119879119905=1119873119905 (34)

Table 1 The synthetic datasets

Syntheticdatasets 119879 119870 119878 119863Syn1 2 2 1119872 = 1000 times 1000 01Syn2 2 4 10119872 = 1000 times 10000 001Syn3 4 2 100119872 = 100 times 100 times 100 times 100 01Syn4 4 4 1000119872 = 100 times 100 times 100 times 1000 001119879 is the number of object types in the heterogeneous information networkand also the number of modes in the tensor119870 is the number of clusters 119878 isthe network scale and 119878 = 1198731 times1198732 times sdot sdot sdot times119873119879119863 is the density of the tensorthat is the percentage of nonzero elements in the tensor and119863 = 119899119899119911(X)119878

512 Experimental Setting In order to compare the perfor-mance of our proposed SGDClus and SOSClus with othersimpartially all methods share a common stopping conditionthat is 1003816100381610038161003816Liter minusLiterminus1

1003816100381610038161003816Literminus1

le 10minus6 (35)

or iter reaches the maximum iterations Liter and Literminus1are the values of function L at the current that is (iter)thiteration and the previous that is (iter minus 1)th iterationrespectively And we set the maximum iterations to be 1000Throughout the experiments the regularization parameter 120582in SGDClus and SOSClus is fixed as 120582 = 0001

All experiments are implemented in theMATLABR2015a(version 850) 64-bit And the MATLAB Tensor Toolbox(version 26 httpwwwsandiagovsimtgkoldaTensorTool-box) is used in our experiments Since the heterogeneousinformation networks are often sparse in real-world scenar-ios that is most elements in the tensor X are zeros we usethe sparse format of X as proposed in [47] which has beensupported by MATLAB Tensor Toolbox The experimentalresults are the average values obtained by running thealgorithms ten times on corresponding datasets

52 Experiments on Synthetic Datasets

521 The Synthetic Datasets Description The purposeof using synthetic datasets is to examine whether theproposed tensor CP decomposition clustering frameworkcan work well since the detailed cluster structures of thesynthetic datasets are known In order to make the syntheticdatasets similar to a realistic situation we assume thatthe distribution for different types of objects that appearin a gene-network follows Zipf rsquos law (see details onlinehttpsenwikipediaorgwikiZipf rsquos_law) Zipf rsquos law isdefined by119891(119903 120588119873) = 119903minus120588sum119873119899=1 119899minus120588 where119873 is the numberof objects 119903 is the object index and 120588 is the parameter charac-terizing the distribution Zipf rsquos law denotes the frequency ofthe 119903th object appearing in the gene-networkWe set 120588 = 095and generate 4 synthetic datasets with different parametersThe details of these synthetic datasets are shown in Table 1

522 Experimental Results In the beginning of the experi-ments we set a common learning rate 120578 = 1(iter + 1) for

Scientific Programming 9

SGDClus on synthetic datasets

0 10 20 30 40 50

Iteration

Erro

r

Syn1Syn2

Syn3Syn4

000

005

010

015

020

025

030

035

SOSClus on synthetic datasets

0 10 20 30 40 50

Iteration

Syn1Syn2

Syn3Syn4

00

01

02

03

04

05

06

07

08

Erro

r

Figure 3 Convergence speed of SGDClus and SOSClus on the 4 synthetic datasets with a common learning rate that is 120578 = 1(iter + 1)SGDClus and SOSClus We find that SOSClus has a fasterconvergence speed and better robustness with respect to thelearning rate which is clearly shown in Figure 3 AlthoughSGDClus may eventually converge to a local minimum theefficiency of the optimization near a local minimum is not allroses As shown in Figure 3 the solutions of SGDClus swingaround a local minimum This phenomenon proves that theconvergence speed of SGDClus is sensitive to the choice oflearning rate 120578

Then we modify the learning rate to 120578 = 1(iter +119888) for SGDClus where 119888 is a constant optimized in theexperiments In practice 119888 = 27855 for SGDClus runningon Syn3 and 119888 = 430245 for SGDClus running on Syn4The performance comparison of SGDClus with learning rate120578 = 1(iter + 119888) and SOSClus with learning rate 120578 = 1(iter +1) on Syn3 and Syn4 is shown in Figure 4 By employingthe optimized learning rate SGDClus converges to a localminimumquicklyHowever compared to SOSClus SGDClusstill has no advantageThe hand drawing blue circles over thecurve of SOSClus in Figure 4 shows that SOSClus can escapefrom a local minimum and find the global minimum whileSGDClus just obtains the first reaching local minimum

According to (23) and (24) we accessorily obtain the solu-tions of OPT by running SOSClus So we compare the ACand NMI of OPT SGDClus and SOSClus on the 4 syntheticdatasets which are shown in Figure 5 With the increase ofobject types in the heterogeneous information networks theAC and NMI of SOSClus and OPT increase distinctly whileperformance of SGDClus almost does not changeWhen 119879 =2 AC and NMI of these three methods on Syn1 and Syn2are almost equal and low However AC and NMI of SOSClusincrease to 1 when 119879 = 4 Since the histograms of OPT SGD-Clus and SOSClus on Syn1 and Syn2 are almost the same andon Syn3 and Syn4 are also similar we know that the param-eters 119870 and 119878 have no significant effect on the performanceGenerally the larger density119863 and the number of object types119879 in the network result in higher AC and NMI of SOSClus

Obviously in the experiments on the 4 synthetic datasetsSOSClus shows an excellent performance SOSClus has afaster convergence speed and better robustness with respectto the learning rate Meanwhile SOSClus performs better onAC and NMI because it can escape from a local minimumand find the global minimum

53 Performance Comparison on Real-World Dataset

531 Real-World Dataset Description The experiments onthe real-world dataset are used to compare the performanceof the tensor CP decomposition clustering framework withother state-of-the-art methods

The real-world dataset extracted from the DBLP databaseis the DBLP-four-area dataset which can be downloadedfrom httpwebcsuclaedusimyzsundataDBLP_four_areazip It is a four research-area subset of DBLP and is usedin [2 3 12 13 15 16 18] The four research areas in DBLP-four-area dataset are database (DB) data mining (DM)machine learning (ML) and information retrieval (IR)respectively There are five representative conferences ineach area And all related authors papers published in theseconferences and terms contained in these papersrsquo titlesare included The DBLP-four-area dataset contains 14376papers with 100 labeled 14475 authors with 4057 labeled20 labeled conferences and 8920 terms The density of theDBLP-four-area dataset is 901935 times 10minus9 so we construct a4-mode tensor with size of 14376 times 14475 times 20 times 8920 and334832 nonzero elements We compare the performance oftensor CP decomposition clustering framework with severalother methods on the labeled record in this dataset

532 Comparative Methods

(i) NetClus (see [2]) An extended version of RankClus [1]which can deal with the network follows the star networkschema The time complexity of NetClus for clustering each

10 Scientific Programming

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

08

Erro

rExperiments on Syn3

SGDClusSOSClus

Escaped from a local minimum

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

Erro

r

Experiments on Syn4

SGDClusSOSClus

Escaped from a local minimum

Figure 4 Convergence speed of SGDClus and SOSClus on Syn3 and Syn4 with different learning rate The learning rate for SOSClus is120578 = 1(iter + 1) while that for SGDClus is 120578 = 1(iter + 119888) where 119888 is a constant optimized in the experiments

OPTSGDClusSOSClus

Synthetic datasets

00

02

04

06

08

10

AC

Syn1 Syn2 Syn4Syn3

OPTSGDClusSOSClus

NM

I

00

02

04

06

08

10

Synthetic datasetsSyn1 Syn2 Syn4Syn3

Figure 5 The AC and NMI of OPT SGDClus and SOSClus on the 4 synthetic datasets

object type in each iteration is 119874(119870|119864| + (1198702 + 119870)119873) where119870 is the number of clusters |119864| is the number of edges in thenetwork and119873 is the total number of objects in the network

(ii) PathSelClus (see [15 16]) A clustering method based onthe predefined metapath requires a user guide In PathSel-Clus the distance between the same type objects is measuredby PathSim [3] and themethod starts with the given seeds byuser The time complexity of PathSelClus for clustering eachobject type in each iteration is 119874((119870 + 1)|P| + 119870119873) where|P| is the number of metapath instances in the networkAnd the time complexity of PathSim used by PathSelClus forclustering each object type is 119874(119873119889) where 119889 is the averagedegree of objects

(iii) FctClus (see [13]) It is a recently proposed clusteringmethod for heterogeneous information networks As withNetClus the FctClus method can deal with networks follow-ing the star network schema The time complexity of FctClusfor clustering each object type in each iteration is 119874(119870|119864| +119879119870119873)533 Experimental Results As the baseline methods canonly deal with specific schema heterogeneous informationnetworks here we must construct different subnetworks forthem For NetClus and FctClus the network is organized as astar network schema like in [2 13] where the paper (P) is thecentre type and author (A) conference (C) and term (T) are

Scientific Programming 11

Table 2 AC of experiments on DBLP-four-area dataset

AC OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 05882 08476 09007 07154 07551 07887Author 05872 08486 09486 07177 07951 08008Conference 1 099 1 09172 09950 09031avg (AC) 05892 08493 09477 07186 07951 08010

Table 3 NMI of experiments on DBLP-four-area dataset

NMI OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 06557 06720 08812 05402 06142 07152Author 06539 08872 08822 05488 06770 06012Conference 1 08497 1 08858 09906 08248avg (NMI) 06556 08778 08827 05503 06770 06050

Table 4 Running time of experiments on DBLP-four-area dataset

Running time (s) OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper mdash mdash mdash 8026 5423 8084Author mdash mdash mdash 7437 6811 7749Conference mdash mdash mdash 6584 6293 6698Total time 6726 4324 8184 22047 18527 22531

the attribute types For PathSelClus we select the metapathof P-T-P A-P-C-P-A and C-P-T-P-C to cluster the papersauthors and conferences respectively And in PathSelCluswe give each cluster one seed to start

Wemodel theDBLP-four-area dataset as a 4-mode tensorwhere each mode represents one object type The 4 modesare author (A) paper (P) conference (C) and term (T)respectively Actually the sequence of the object types isinsignificant And each element of the tensor represents agene-network in the heterogeneous information network Inthe experiments we set the learning rate for SOSClus to be120578 = 1(iter+1) and an optimized learning rate 120578 = 1(iter+119888)with 119888 = 1000125 for SGDClus By running SOSClus weaccessorily obtain the solutions of OPT So we compare theexperimental results of OPT SGDClus and SOSClus on theDBLP-four-area dataset with the three baseline methods Seethe details in Tables 2 3 and 4

In Tables 2 and 3 SOSClus performs best on average ACand NMI and SGDClus takes the second place All methodsachieve satisfactory AC and NMI on the conference sincethere are only 20 conferences in the network SGDClus takesthe shortest running time and OPT and SOSClus have anobvious advantage on running time compared with otherbaselinesThe time complexity of SGDClus is119874(119899119899119911(X)119873119870+(119879 minus 1)1198702119873) and the time complexity of SOSClus is119874(119899119899119911(X)119873119870+(119879minus1)1198702119873+1198703) where 119899119899119911(X) is the num-ber of nonzero elements in X that is the number of gene-networks in the heterogeneous information network Wehave 119899119899z(X) lt |P| ≪ |119864| 119870 ≪ 119873 and 119879 ≪ 119873 Comparedwith the time complexity of the three baselines SGDClus andSOSClus have a little disadvantage However it is worth not-ing that the three baselines can only cluster one type of objectsin the network in each running while OPT SGDClus and

SOSClus can obtain the clusters of all types of objects simulta-neously by running onceThis is the reasonwhy only the totaltime is shown for OPT SGDClus and SOSClus in Table 4Moreover the total time of OPT SGDClus and SOSClusis on the same order of magnitude as the running time ofother baselines for clustering each type of objects which isconsistent with the comparison of the time complexity

6 Conclusion

In this paper a tensor CP decomposition method for clus-tering heterogeneous information networks is presented Intensor CP decomposition clustering framework each type ofobjects in heterogeneous information network is modeled asone mode of tensor and the gene-networks in the networkare modeled as the elements in tensor In other wordstensor CP decomposition clustering framework can modeldifferent types of objects and semantic relations in the het-erogeneous information network without the restriction ofnetwork schema In addition two stochastic gradient descentalgorithms named SGDClus and SOSClus are designedSGDClus and SOSClus can cluster all types of objects andthe gene-networks simultaneously by running once Theproposed algorithms outperformed other state-of-the-artclustering methods in terms of AC NMI and running time

Notations119886 A scalar (lowercase letter)a A vector (boldface lowercase letter)A A matrix (boldface capital letter)X A tensor (calligraphic letter)

12 Scientific Programming

1199091198941 1198942119894119873 The (1198941 1198942 119894119873)th element of an 119873th-order tensorX

X(119899) Matricization ofX along the 119899th modelowast Hadamard product (element-wise prod-uct) of two tensors (or matrices or vectors)with the same dimension⊙ Khatri-Rao product of two matrices sdot 119865 Frobenius norm of a tensor (or matrices orvectors)

a ∘ b Outer product of two vectors

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This study was supported by the National Natural ScienceFoundation of China (no 61401482 and no 61401483)

References

[1] Y Sunt J Hant P Zhao Z Yin H Cheng and T WuldquoRankClus integrating clustering with ranking for heteroge-neous information network analysisrdquo in Proceedings of the 12thInternational Conference on Extending Database TechnologyAdvances inDatabase Technology (EDBT rsquo09) pp 565ndash576 SaintPetersburg Russia March 2009

[2] Y Sun Y Yu and J Han ldquoRanking-based clustering of hetero-geneous information networks with star network schemardquo inProceedings of the 15th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo09) pp 797ndash805 Paris France July 2009

[3] Y Sun J Han X Yan P S Yu and TWu ldquoPathSim meta path-based top-K similarity search in heterogeneous informationnetworksrdquo PVLDB vol 4 no 11 pp 992ndash1003 2011

[4] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability Volume1 Statistics pp 281ndash297 University of California Press BerkeleyCalif USA 1967

[5] S Guha R Rastogi and K Shim ldquoCure an efficient clusteringalgorithm for large databasesrdquo Information Systems vol 26 no1 pp 35ndash58 2001

[6] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the 2nd International Conference onKnowledge Discovery and Data Mining (KDD-96) E SimoudisJ Han and U M Fayyad Eds pp 226ndash231 AAAI PressPortland Ore USA 1996

[7] W Wang J Yang and R R Muntz ldquoSTING a statisticalinformation grid approach to spatial data miningrdquo in Pro-ceedings of the 23rd International Conference on Very LargeData Bases (VLDB rsquo97) M Jarke M J Carey K R DittrichF H Lochovsky P Loucopoulos and M A Jeusfeld Edspp 186ndash195 Morgan Kaufmann Athens Greece August 1997httpwwwvldborgconf1997P186PDF

[8] E H Ruspini ldquoNew experimental results in fuzzy clusteringrdquoInformation Sciences vol 6 no 73 pp 273ndash284 1973

[9] U von Luxburg ldquoA tutorial on spectral clusteringrdquo Statistics andComputing vol 17 no 4 pp 395ndash416 2007

[10] C Shi Y Li J Zhang Y Sun and P S Yu ldquoA survey of hetero-geneous information network analysisrdquo IEEE Transactions onKnowledge and Data Engineering vol 29 no 1 pp 17ndash37 2017

[11] J Han Y Sun X Yan and P S Yu ldquoMining knowledge fromdata an information network analysis approachrdquo in Proceedingsof the IEEE 28th International Conference on Data Engineering(ICDE rsquo12) pp 1214ndash1217 IEEE Washington DC USA April2012

[12] J Chen W Dai Y Sun and J Dy ldquoClustering and rankingin heterogeneous information networks via gamma-poissonmodelrdquo in Proceedings of the SIAM International Conference onData Mining (SDM rsquo15) pp 424ndash432 May 2015

[13] J Yang L Chen and J Zhang ldquoFctClus a fast clustering algo-rithm for heterogeneous information networksrdquoPLoSONE vol10 no 6 Article ID e0130086 2015

[14] C Shi R Wang Y Li P S Yu and B Wu ldquoRanking-basedclustering on general heterogeneous information networks bynetwork projectionrdquo in Proceedings of the 23rd ACM Interna-tional Conference on Conference on Information and KnowledgeManagement (CIKM rsquo14) pp 699ndash708 ACM Shanghai ChinaNovember 2014

[15] Y Sun B Norick J Han X Yan P S Yu and X Yu ldquoInte-grating meta-path selection with user-guided object clusteringin heterogeneous information networksrdquo in Proceedings of the18th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo12) pp 1348ndash1356 ACMBeijing China August 2012

[16] Y Sun B Norick J Han X Yan P S Yu and X YuldquoPathSelClus integrating meta-path selection with user-guidedObject clustering in heterogeneous information networksrdquoACMTransactions onKnowledgeDiscovery fromData vol 7 no3 pp 723ndash724 2013

[17] X Yu Y Sun B Norick T Mao and J Han ldquoUser guided entitysimilarity search using meta-path selection in heterogeneousinformation networksrdquo in Proceedings of the 21st ACM Interna-tional Conference on Information and Knowledge Management(CIKM rsquo12) pp 2025ndash2029 ACM November 2012

[18] Y Sun C C Aggarwal and J Han ldquoRelation strength-awareclustering of heterogeneous information networks with incom-plete attributesrdquoProceedings of the VLDBEndowment vol 5 no5 pp 394ndash405 2012

[19] M Zhang H Hu Z He andWWang ldquoTop-k similarity searchin heterogeneous information networks with x-star networkschemardquo Expert Systems with Applications vol 42 no 2 pp699ndash712 2015

[20] Y Zhou and L Liu ldquoSocial influence based clustering ofheterogeneous information networksrdquo in Proceedings of the19th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 338ndash346 ACM Chicago IllUSA August 2013

[21] L R Tucker ldquoSome mathematical notes on three-mode factoranalysisrdquo Psychometrika vol 31 pp 279ndash311 1966

[22] R A Harshman ldquoFoundations of the parafac procedure modeland conditions for an lsquoexplanatoryrsquo multi-mode factor analysisrdquoUCLAWorking Papers in Phonetics 1969

[23] H A L Kiers ldquoTowards a standardized notation and terminol-ogy in multiway analysisrdquo Journal of Chemometrics vol 14 no3 pp 105ndash122 2000

[24] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

Scientific Programming 13

[25] A Cichocki D Mandic L De Lathauwer et al ldquoTensordecompositions for signal processing applications from two-way to multiway component analysisrdquo IEEE Signal ProcessingMagazine vol 32 no 2 pp 145ndash163 2015

[26] W Peng and T Li ldquoTensor clustering via adaptive subspaceiterationrdquo Intelligent Data Analysis vol 15 no 5 pp 695ndash7132011

[27] J Hastad ldquoTensor rank is NP-completerdquo Journal of AlgorithmsCognition Informatics and Logic vol 11 no 4 pp 644ndash6541990

[28] S Metzler and P Miettinen ldquoClustering Boolean tensorsrdquo DataMining amp Knowledge Discovery vol 29 no 5 pp 1343ndash13732015

[29] X Cao X Wei Y Han and D Lin ldquoRobust face clustering viatensor decompositionrdquo IEEE Transactions on Cybernetics vol45 no 11 pp 2546ndash2557 2015

[30] I Sutskever R Salakhutdinov and J B Tenenbaum ldquoModellingrelational data using Bayesian clustered tensor factorizationrdquoin Proceedings of the 23rd Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo09) pp 1821ndash1828 BritishColumbia Canada December 2009

[31] B Ermis E Acar and A T Cemgil ldquoLink prediction in het-erogeneous data via generalized coupled tensor factorizationrdquoDataMining amp Knowledge Discovery vol 29 no 1 pp 203ndash2362015

[32] A R Benson D F Gleiche and J Leskovec ldquoTensor spectralclustering for partitioning higher-order network structuresrdquoin Proceedings of the SIAM International Conference on DataMining (SDM rsquo15) pp 118ndash126 Vancouver Canada May 2015

[33] L Xiong X Chen T-K Huang J G Schneider and JG Carbonell ldquoTemporal collaborative filtering with Bayesianprobabilistic tensor factorizationrdquo in Proceedings of the 10thSIAM International Conference on Data Mining (SDM rsquo10) pp211ndash222 Columbus Ohio USA May 2010

[34] E E Papalexakis L Akoglu and D Ience ldquoDo more views ofa graph help Community detection and clustering in multi-graphsrdquo in Proceedings of the 16th International Conferenceof Information Fusion (FUSION rsquo13) pp 899ndash905 IstanbulTurkey July 2013

[35] M Vandecappelle M Bousse F V Eeghem and L D Lath-auwer ldquoTensor decompositions for graph clusteringrdquo Inter-nal Report 16-170 ESAT-STADIUS KU Leuven LeuvenBelgium 2016 ftpftpesatkuleuvenbepubSISTAsistakul-akreports2016_Tensor_Graph_Clusteringpdf

[36] W Shao L He and P S Yu ldquoClustering on multi-sourceincomplete data via tensor modeling and factorizationrdquo inProceedings of the 19th Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD rsquo15) pp 485ndash497 Ho ChiMinh City Vietnam 2015

[37] J D Carroll and J-J Chang ldquoAnalysis of individual differencesin multidimensional scaling via an n-way generalization ofldquoEckart-Youngrdquo decompositionrdquo Psychometrika vol 35 no 3pp 283ndash319 1970

[38] L De Lathauwer B De Moor and J Vandewalle ldquoOn thebest rank-1 and rank-(R1 R2 RN) approximation ofhigher-order tensorsrdquo SIAM Journal on Matrix Analysis andApplications vol 21 no 4 pp 1324ndash1342 2000

[39] E Acar D M Dunlavy and T G Kolda ldquoA scalable opti-mization approach for fitting canonical tensor decompositionsrdquoJournal of Chemometrics vol 25 no 2 pp 67ndash86 2011

[40] S Hansen T Plantenga and T G Kolda ldquoNewton-basedoptimization for Kullback-Leibler nonnegative tensor factor-izationsrdquo Optimization Methods amp Software vol 30 no 5 pp1002ndash1029 2015

[41] NVervliet and LDe Lathauwer ldquoA randomized block samplingapproach to canonical polyadic decomposition of large-scaletensorsrdquo IEEE Journal on SelectedTopics in Signal Processing vol10 no 2 pp 284ndash295 2016

[42] R Ge F Huang C Jin and Y Yuan ldquoEscaping from saddlepointsmdashonline stochastic gradient for tensor decompositionrdquohttparxivorgabs150302101

[43] T G Kolda ldquoMultilinear operators for higher-order decompo-sitionsrdquo Tech Rep SAND2006-2081 Sandia National Labora-tories 2006 httpwwwostigovscitechbiblio923081=0pt

[44] P Paatero ldquoA weighted non-negative least squares algorithmfor three-way lsquoPARAFACrsquo factor analysisrdquo Chemometrics ampIntelligent Laboratory Systems vol 38 no 2 pp 223ndash242 1997

[45] P Paatero ldquoConstruction and analysis of degenerate PARAFACmodelsrdquo Journal of Chemometrics vol 14 no 3 pp 285ndash2992000

[46] B L Bottou and N Murata ldquoStochastic approximations andefficient learningrdquo inTheHandbook of BrainTheory and NeuralNetworks 2nd edition 2002

[47] B W Bader and T G Kolda ldquoEfficient MATLAB computationswith sparse and factored tensorsrdquo SIAM Journal on ScientificComputing vol 30 no 1 pp 205ndash231 2007

[48] A Strehl and J Ghosh ldquoCluster ensemblesmdasha knowledgereuse framework for combining multiple partitionsrdquo Journal ofMachine Learning Research vol 3 no 3 pp 583ndash617 2003

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: A Tensor CP Decomposition Method for Clustering ...downloads.hindawi.com/journals/sp/2017/2803091.pdf · spectral clustering, are designed to analyze homogeneous informationnetworks

2 Scientific Programming

Paper

Author

Venue Term

writ

ten

by

writ

e

published

by

contain

contained in

publish

Figure 1 An example of heterogeneous information network abibliographic network extracted from the DBLP database

target objects and attribute objects The network schemais a metatemplate of a heterogeneous information networkwhich shows how many types of objects and links are therein the network [3] Figure 1 shows a typical star networkschema where the paper (P) is the target object and othersare attribute objects

A tensor is a generalization of the matrix in the high-dimensional space It is a natural expression of complicatedand interpretable structures in high-mode data In this paperwe model a heterogeneous information network as a tensorwithout the restriction of network schema Each type ofobjects maps onto one mode of the tensor and the semanticrelations among different object types map onto the elementsin the tensor Then a tensor CP decomposition method isadapted to formulate the clustering problem in heteroge-neous information networks And two stochastic gradientdescent algorithms are developed which lead to effectiveclustering multityped objects simultaneously The experi-mental results on both synthetic datasets and real-worlddataset show that the proposed clustering framework canmodel the heterogeneous information networks efficientlyand outperform the state-of-the-art clustering methods

The rest of this paper is organized as follows In Section 2we discuss the related work on clustering for heterogeneousinformation networks and the tensor factorization Section 3gives some notations and definitions used in this paper InSection 4 we formulate the clustering problem and describetwo stochastic gradient descent algorithmsThe experimentalresults on both synthetic datasets and real-world dataset arepresented in Section 5 Finally the conclusions are drawn inSection 6

2 Related Work

Our work mainly focuses on the clustering heterogeneousinformation networks and the tensor factorization

21 Clustering Heterogeneous Information Networks Cluster-ing is an unsupervised learning method to recognize thedistribution and hidden structures in the data which is abasic and significant mission for pattern recognition andmachine learning Since MacQueen first proposed K-means[4] in 1967 many subtle algorithms have been developedfor clustering in the past decades However most existingalgorithms such as hierarchical clustering algorithm [5]density-based clustering [6] mesh-based clustering [7] fuzzyclustering algorithm [8] and spectral clustering [9] aredesigned to analyze point sets or homogeneous informationnetworks which are composed of only one type of objects andlinks

In real-world applications the datasets are often orga-nized as heterogeneous information networks where objectsand the relations between them are of more than one type Inrecent years researchers have made a significant progress onclustering for heterogeneous information networks [10 11]which largely focused on four main directions the first is touse a ranking based clustering algorithm [1] it developed theRankClus algorithm that integrated clustering with rankingfor clustering bityped networks Its extension NetClus [2]was developed for the star network schemaThey have proventhat ranking and clustering can mutually enhance each otherGPNRankClus [12] assumed that edges in heterogeneousinformation networks follow a Poisson distribution Thismethod can simultaneously achieve both clustering andranking in a heterogeneous informationnetwork In additionFctClus [13] achieved a higher computational speed andhad a greater clustering accuracy when applied to heteroge-neous information networks But same as NetClus FctClusalgorithm can only handle the star network schema For ageneral network schema HeProjI [14] projected the networkinto a number of bityped or star schema subnetworks andperformed the ranking based clustering in each subnetwork

The second direction involves metapath based clusteringalgorithms [15 16] Ametapath is a connected path defined onthe network schema of a heterogeneous information networkwhich represents a composite semantic relation between twoobjects PathSim (metapath based top-k similarity search) [3]measured the similarity between the same types of objectsbased on metapath in heterogeneous information networksHowever it has a limitation themetapathmust be symmetricthat is PathSim could not work on different types of objectsThe PathSelClus algorithm in [15ndash17] integrated metapathselection with user guidance to cluster objects in networkswhere user provided seeds for each cluster acted as guidance

The third direction is structural-based clustering Sun etal proposed a probabilistic clustering method [18] to dealwith heterogeneous information networks with incompleteattributes which integrated the incomplete attribute infor-mation and the network structure information NetSim [19]is a structural-based similarity measurement between objectsfor x-star network Xu proposed a Bayesian probabilisticmodel based onnetwork structural information for clusteringheterogeneous information networks

The final direction is a clustering algorithm based onsocial network features Zhou and Liu designed the SI-Clusteralgorithm [20] which adopted the heat diffusion procedure

Scientific Programming 3

to model the social influence and thenmeasure the similaritybetween objects

22 Tensor Factorization A tensor is a multidimensionalarray in which the elements are addressed by more thantwo indices Tensor factorization has been studied since theearly 20th century [21ndash25] Two of the most popular tensorfactorization methods are Tucker decomposition [21 24] andcanonical decomposition using parallel factors (CANDE-COMPPARAFAC) [23 24] The CANDECOMPPARAFACis also named CP decomposition It is worth noting that CPdecomposition is a special case of Tucker decomposition

The clustering issues based on tensor factorization areoften modeled as the optimization problems [26] But it hasbeen proven in [27] that tensor clustering formulations areNP-hard In the past years many approximation algorithmsfor tensor clustering are proposed [28ndash30] These theoriesprovide a new perspective for us as to clustering heteroge-neous information networks Tensor factorization based clus-tering has also been used in some specific applications Exam-ples include link prediction in higher-order network struc-tures [31 32] collaborative filtering in recommendation sys-tems [33] community detection in multigraphs [34] graphclustering [35] and modeling multisource datasets [36]

The Alternating Least Squares (ALS) algorithm [22 3738] is one of themost famous and commonly used algorithmsto solve the tensor factorization which updates one compo-nent iteratively at each round while holding the others con-stant However ALS suffers from some limitations for exam-ple ALS may converge to a local minimum and the memoryconsumption may explode when the scale of tensor is largeNonlinear optimization approach is another option to obtainthe tensor factorization such as nonlinear conjugate gradientmethod [39] Newton based optimization [40] randomizedblock sampling method [41] and stochastic gradient descent[42] In this paper we adopt a stochastic gradient descentalgorithmwith Tikhonov regularization item loss function toprocess the tensor CP decomposition based clustering

3 Preliminaries

First we introduce some related concepts and notations oftensors used in this paper More details about tensor algebracan be found in [24 43] The order of a tensor is the numberof dimensions also known as ways or modes We will followthe convention used in [23] to denote scalars by lowercaseletters for example 119886 119887 119888 vectors (one mode) by boldfacelowercase letters for example a b c matrices (two modes)by boldface capital letters for example ABC and tensors(three modes or more) by boldface calligraphic letters forexample XYZ Elements of a matrix or a tensor aredenoted by lowercase letters with subscripts that is the(1198941 1198942 119894119873)th element of an119873th-order tensorX is denotedby 1199091198941 1198942 119894119873 The notations about tensor algebra used in thispaper are summarized in Notations

If an119873th-order tensorX isin R1198681times1198682timessdotsdotsdottimes119868119873 can be written asthe outer product of119873 vectors that isX = a(1) ∘a(2) ∘sdot sdot sdot∘a(119873)and a(119899) isin R119868119899 119899 = 1 2 119873 tensor X is named rank-onetensor The CP decomposition represents a tensor as a sum

of 119877 rank-one tensors that is the CP decomposition ofX isX = sum119877119903=1 a(1)119903 ∘ a(2)119903 ∘ sdot sdot sdot ∘ a(119873)119903 where 119877 is a positive integerand a(119899)119903 isin R119868119899 119903 = 1 2 119877 119899 = 1 2 119873 The rank of atensor is defined as the smallest number of rank-one tensorsfor which the equality holds in the CP decomposition anddenoted as rank(X) = min119877 In fact the problem of tensorrank determination is NP-hard [27]

Let factor matrices A(119899) = [a(119899)1 a(119899)2 a(119899)119877 ] isin R119868119899times119877for 119899 = 1 2 119873 We denote ⟦A(1)A(2) A(119873)⟧ equivsum119877119903=1 a(1)119903 ∘ a(2)119903 ∘ sdot sdot sdot ∘ a(119873)119903 By minimizing the Frobeniusnorm of the difference betweenX and its CP approximationthe CP decomposition can be formulated as an optimizationproblem

min 12 10038171003817100381710038171003817X minus ⟦A(1)A(2) A(119873)⟧100381710038171003817100381710038172119865 (1)

Then we give the definitions for the information networkand the network schema which are based on the work by Sunet al [3]

Definition 1 (information network [3]) An information net-work is a graph 119866 = (119881 119864) defined on a set of objects 119881 and aset of links 119864 where119881 belongs to 119879 objects types V = V119905119879119905=1and 119864 belongs to 119871 links types E = R119897119871119897=1

Specifically when 119879 gt 1 or 119871 gt 1 the information net-work is called heterogeneous information network otherwiseit is called homogeneous information network

We denote the object set of type V119905 as V119905119899119873119905119899=1 where 119873119905is the number of objects in type V119905 that is 119873119905 = |V119905| and119905 = 1 2 119879 The total number of objects in the network isgiven by119873 = sum119879119905=1119873119905Definition 2 (network schema [3]) The network schema is ametatemplate for a heterogeneous information network 119866 =(119881 119864) which is a graph defined over object types V and linkstypes E denoted by 119878119866 = V E

A network schema 119878119866 = V E shows how many types ofobjects are there in the network119866 = (119881 119864) andwhich type thelinks between different object types belong to Figure 1 showsthe network schema of DBLP which follows a star networkschema

4 Tensor CP Decomposition BasedClustering Framework

41 Tensor Construction According to Definition 2 we knowthat the network schema 119878119866 = V E is a metatemplate forthe given heterogeneous information network 119866 = (119881 119864) Inother words 119866 = (119881 119864) is an instance of 119878119866 = V E So wecan find at least one subnetwork of 119866 = (119881 119864) which followsthe schema 119878119866 = V EDefinition 3 (gene-network) A gene-network denoted by1198661015840 = (1198811015840 1198641015840) is the minimum subnetwork of 119866 = (119881 119864)which follows the schema 119878119866 = V E

It is easy to see that a gene-network is one of thesmallest instances of 119878119866 = V E in the set of subnet-works of 119866 = (119881 119864) For example a gene-network in

4 Scientific Programming

+

21

3

4

1

4

2

3

Cluster 1

1 2 3 412

34

12

34

5

Cluster 2 Cluster K

1

2

5

4

3 + middot middot middot +

Figure 2 An illustration of tensor CP decomposition method for clustering heterogeneous information network

DBLP network (in Figure 1 which contains four types ofobjects 119860 119875 119881 119879) denoted by 1198661015840 = (V119860119894 V119875119895 V119881119898 V119879119899 ⟨V119860119894 V119875119895 ⟩ ⟨V119875119895 V119881119898⟩⟨V119875119895 V119879119899 ⟩) represents a semantic relation ofldquoan Author V119860119894 writes a Paper V119875119895 published in the Venue V119881119898and containing the Term V119879119899 rdquo For simplicity we can use thesubscript of each object in 1198661015840 to mark the correspondinggene-network In the example the gene-network 1198661015840 can bemarked by 1198661015840119894119895119898119899

LetX be a119879th-order tensor of size1198731times1198732timessdot sdot sdottimes119873119879 eachmode of X represents one type of objects in the network 119866An arbitrary element 11990911989911198992sdotsdotsdot119899119879 isin 0 1 for 119899119905 = 1 2 119873119905is an indicator of whether the corresponding gene-network11986610158401198991 1198992 119899119879 exists that is

11990911989911198992sdotsdotsdot119899119879 = 1 if exist11986610158401198991 1198992119899119879 0 otherwise

(2)

Then the heterogeneous information network119866 = (119881 119864)can be represented by the form of tensor asX

42 Problem Formulation Using the tensor representationX of 119866 = (119881 119864) we can partition the multityped objectsinto different clusters by the CP decomposition We assumethat there are 119870 clusters in 119866 = (119881 119864) and denote U(119905) isinR119873119905times119870 119905 = 1 2 119879 as the cluster indication matrix of the119905th type of objects Then the CP decomposition ofX is

min 10038171003817100381710038171003817X minus ⟦U(1)U(2) U(119879)⟧100381710038171003817100381710038172119865 (3)

Each row u(119905)119896 = [119906(119905)1198961 119906(119905)1198962 119906(119905)119896119873119905]⊤ of the factor matrixU(119905) is the probability vector for each object from 119905th typebelonging to the 119896th cluster In other words the 119896th clusteris composed of the 119896th rank-one tensor in the CP decompo-sition that is u(1)119896 ∘ u(2)119896 ∘ sdot sdot sdot ∘ u(119879)119896

Figure 2 gives an example of tensor CP decompositionmethod for clustering heterogeneous information networkThe left one is the original networkwith three types of objectsthemiddle cube is a 3-mode tensor and the right one is theCPdecomposition of the 3-mode tensor and also is the partitionof the original network In addition the three types of objectsare the yellow round the blue square and the red triangle

respectively The number within each object is the identifierof the object Each element (black dot in the middle cube) inthe tensor represents a gene-network in the network (blackdashed circle in the left) Each component (black dashedcircle in the right) in the CP decomposition shows one clusterof the original network

The problem in (3) is an NP-hard nonconvex optimiza-tion problem which has a continuous manifold of equivalentsolutions [39] In other words the global minimum isdrowned in many local minima which makes it difficult tobe found In real-world scenarios the objects in the het-erogeneous information networks may belong to more thanone cluster that is the clusters are overlapping Howeverthe number of clusters that the vast majority of objects maybelong to is much smaller than the total number of clustersThat is most of the elements in U(119905) should be zero that isU(119905) should be sparse To overcome the two challenges wecan introduce a Tikhonov regularization term proposed byPaatero in [44 45] in the objective function and replace theobjective function by the following loss function

L (XU(1)U(2) U(119879))= 12 10038171003817100381710038171003817X minus ⟦U(1)U(2) U(119879)⟧100381710038171003817100381710038172119865 + 1205822 119879sum

119905=1

10038171003817100381710038171003817U(119905)100381710038171003817100381710038172119865 (4)

where 120582 gt 0 is a regularization parameter Let

119891 (XU(1)U(2) U(119879))= 12 10038171003817100381710038171003817X minus ⟦U(1)U(2) U(119879)⟧100381710038171003817100381710038172119865 (5)

be the first squared loss function component inL and let

119892 (U(1)U(2) U(119879)) = 1205822 119879sum119905=1

10038171003817100381710038171003817U(119905)100381710038171003817100381710038172119865 (6)

be the Tikhonov regularization term inL respectivelyThen

L = 119891 + 119892 (7)

The Tikhonov regularization term 119892 in the loss functionL has an encouraging property which makes the Frobenius

Scientific Programming 5

norms of all factor matrices in the optimization be equal thatis 10038171003817100381710038171003817U(1)10038171003817100381710038171003817119865 = 10038171003817100381710038171003817U(2)10038171003817100381710038171003817119865 = sdot sdot sdot = 10038171003817100381710038171003817U(119879)10038171003817100381710038171003817119865 (8)

Therefore the local minima of loss function L becomeisolated and any replacement and scaling of the satisfactorysolutions will escape from the optimization The detailsof proof can be found in [39] Meanwhile the Tikhonovregularization term can ensure the sparsity of the factormatrices by penalizing the number of nonzero elements

Therefore the tensor CP decomposition method forclustering heterogeneous information networks can be for-malized as

minU(1) U(2) U(119879)

L (XU(1)U(2) U(119879))st 119870sum

119896=1

119906(119905)119899119896 = 1 forall119905 forall119899119906(119905)119899119896 isin [0 1] forall119905 forall119899 forall119896119873119905sum119899=1

119906(119905)119899119896 gt 0 forall119905 forall119896(9)

where 119899 = 1 2 119873119905 119905 = 1 2 119879 119896 = 1 2 119870and 119870 lt min 1198731 1198732 119873119879 is the total number ofclusters In (9) we divide X into 119870 clusters and obtain thestructure of each cluster which includes the distribution ofeach objectThe first constraint in (9) guarantees that the sumof probabilities for each object belonging to all clusters is 1The second constraint in (9) represents that each probabilityshould be in the range of [0 1] The last constraint in (9)makes sure that there is no empty cluster for any mode

43 The Stochastic Gradient Descent Algorithms Stochasticgradient descent is a mature and widely used tool foroptimizing various models in machine learning such as arti-ficial neural networks support vector machines and logisticregression In this section the regularized clustering problemin (9) will be addressed by the stochastic gradient descentalgorithms The details of tensor algebra and properties usedin this section can be found in [43]

First we review the stochastic gradient descent algorithmTo solve an optimization problem min119909119876(119909) where 119876(119909) isa differentiable object function to be minimized and 119909 is avariable the stochastic gradient descent method to update 119909can be described as

119909 larr997888 119909 minus 120578nabla119876 (119909) (10)

where 120578 is a positive number named learning rate or stepsize The convergence speed of stochastic gradient descentalgorithm depends on the choice of learning rate 120578 and initialsolution

Though the stochastic gradient descent algorithm mayconverge to a local minimum at a linear speed the efficiencyof the algorithmnear the optimal point is not all roses [46] To

speed up the final optimization phase an extension methodnamed second-order stochastic algorithm is designed in [46]which replaces the learning rate 120578 by the inverse of second-order derivative of the object function that is

119909 larr997888 119909 minus 120578 (nabla2119876 (119909))minus1 nabla119876 (119909) (11)

Now we apply the stochastic gradient descent and thesecond-order stochastic algorithm to the clustering prob-lem in (9) and propose two algorithms named SGDClus(Stochastic Gradient Descent for Clustering) and SOSClus(Second-Order Stochastic for Clustering) respectively

431 SGDClus In SGDClus we apply the stochastic gradientdescent algorithm to the clustering problem in (9) Accordingto (10) each factor matrix U(119905) for 119905 = 1 2 119879 is updatedby the rule

U(119905) larr997888 U(119905) minus 120578 120597L120597U(119905) = U(119905) minus 120578( 120597119891120597U(119905) + 120597119892120597U(119905)) (12)

Actually 120597119892120597U(119905) is easy to be obtained according to (6)that is 120597119892120597U(119905) = 120582U(119905) (13)

To compute 120597119891120597U(119905) 119891(XU(1)U(2) U(119879)) can berewritten by matricization ofX along the 119905th mode as

119891(119905) = 12 100381710038171003817100381710038171003817X(119905) minus U(119905) (⊙(119905)U)⊤1003817100381710038171003817100381710038172119865 (14)

where ⊙(119905)U = U(119879) ⊙ sdot sdot sdot ⊙ U(119905+1) ⊙ U(119905minus1) ⊙ sdot sdot sdot ⊙ U(1) Thenwe have120597119891(119905)120597U(119905)

= 120597119879119903 ((X(119905) minus U(119905) (⊙(119905)U)⊤) (X(119905) minus U(119905) (⊙(119905)U)⊤)⊤)2120597U(119905)= 120597119879119903 (X(119905)X

⊤(119905))2120597U(119905) minus 120597119879119903 (2X(119905) (⊙(119905)U) (U(119905))⊤)2120597U(119905)

+ 120597119879119903 ((U(119905) (⊙(119905)U)⊤) (U(119905) (⊙(119905)U)⊤)⊤)2120597U(119905)= minusX(119905) (⊙(119905)U) + U(119905) (⊙(119905)U)⊤ (⊙(119905)U)

(15)

Another way to compute 120597119891(119905)120597U(119905) is given in [39] whichhas the same result as (15) Following the works in [39] wedenote

Γ(119905) = (⊙(119905)U)⊤ (⊙(119905)U)= ((U(1))⊤U(1)) lowast sdot sdot sdot lowast ((U(119905minus1))⊤U(119905minus1))

lowast ((U(119905+1))⊤U(119905+1)) lowast sdot sdot sdot lowast ((U(119879))⊤U(119879)) (16)

6 Scientific Programming

Input X 119870 120582Output U(119905)119879119905=1(1) Initialize U(119905)119879119905=1(2) Set iter larr 1(3) repeat(4) for 119905 larr 1 to119879 do(5) Set 120578(6) Update U(119905) according to (18)(7) Normalize U(119905) according to (19)(8) end for(9) Set iter larr iter + 1(10) until L(XU(1)U(2) U(119879)) unchanged or iter

reaches the maximum iterations

Algorithm 1 The SGDClus algorithm

Therefore the partial derivative ofL is given by

120597L120597U(119905) = minusX(119905) (⊙(119905)U) + U(119905)Γ(119905) + 120582U(119905) (17)

And (12) can be rewritten as

U(119905) larr997888 U(119905) minus 120578 120597L120597U(119905)= U(119905) (I minus 120578 (Γ(119905) + 120582I)) + 120578X(119905) (⊙(119905)U) (18)

where I is an identity matrix Note that U(119905)119879119905=1 derived by(18) do not satisfy the first and second constraints in (9) Tosatisfy these two constraints we can normalize each row ofU(119905)119879119905=1 by

119906(119905)119899119896 larr997888 119906(119905)119899119896sum119870119896=1 119906(119905)119899119896 (19)

Furthermore the pseudocode of SGDClus is given inAlgorithm 1

432 SOSClus In SOSClus we apply the second-orderstochastic algorithm to the clustering problem in (9) Accord-ing to (11) each factor matrix U(119905) for 119905 = 1 2 119879 isupdated by the rule

U(119905) larr997888 U(119905) minus 120578( 1205972L1205972U(119905))minus1 120597L120597U(119905) (20)

According to (17) we can obtain the second-order partialderivative ofL with respect to U(119905) that is

1205972L1205972U(119905) = Γ(119905) + 120582I (21)

By substituting (17) and (21) into (20) we have

U(119905) larr997888 (1 minus 120578)U(119905) + 120578X(119905) (⊙(119905)U) (Γ(119905) + 120582I)minus1 (22)

Input X 119870 120582Output U(119905)119879119905=1(1) Initialize U(119905)119879119905=1(2) Set iter larr 1(3) repeat(4) for 119905 larr 1 to 119879 do(5) Set 120578(6) Compute U(119905)opt according to (23)(7) Update U(119905) according to (24)(8) Normalize U(119905) according to (19)(9) end for(10) Set iter larr iter + 1(11) until L(XU(1)U(2) U(119879)) unchanged or iter

reaches the maximum iterations

Algorithm 2 The SOSClus algorithm

Note thatX(119905)(⊙(119905)U)(Γ(119905)+120582I)minus1 is theGeneralGradient-based Optimization (OPT) [39] solution for updating U(119905) inthe regularized CP decomposition by making (17) equal tozero See the details of proof in [39] Let

U(119905)opt = X(119905) (⊙(119905)U) (Γ(119905) + 120582I)minus1 (23)

Therefore the updating rule of U(119905) in SOSClus is a weightedaverage of the current solution and the OPT solution intu-itively that is

U(119905) larr997888 (1 minus 120578)U(119905) + 120578U(119905)opt (24)

Actually SOSClus is a general extension of GeneralGradient-based Optimization (OPT) [39] and the ALS withstep size restriction in randomized block sampling method[41] In (24) when the learning rate 120578 = 1 we get the OPTsolution When the regularization parameter 120582 = 0 SOSClusbecomes the ALS with step size restriction in randomizedblock sampling method

Similar to SGDClus U(119905)119879119905=1 derived by (24) in SOSClusalso do not satisfy the first and second constraints in (9) Weshould normalize each row of U(119905)119879119905=1 according to (19) Thepseudocode of SOSClus is given in Algorithm 2

44 Feasibility Analysis

Theorem 4 TheCP decomposition ofX obtains the clusteringofmultityped objects in the heterogeneous information network119866 = (119881 119864) simultaneously

Proof Since the proofs for different types of objects in theheterogeneous information network 119866 = (119881 119864) are similarwewill simply describe the process for a single type of objectsWithout loss of generality we detail the proof on the 119905th typeof objects

Given a heterogeneous information network 119866 = (119881 119864)and its tensor representation X the nonzero elements inX represent the input gene-networks in 119866 = (119881 119864) which

Scientific Programming 7

we want to partition into 119870 clusters C1C2 C119870 Thecentre of the cluster C119896 is denoted by c119896 By using thecoordinate format [47] as the sparse representation ofX thegene-networks can be denoted as a matrix M isin R119899119899119911(X)times119879where 119899119899119911(X) is the number of nonzero elements inX Eachrow m119899 isin M 119899 = 1 2 119899119899119911(X) gives the subscripts ofcorresponding nonzero element in X In other words m119899

represents a gene-network and the entries 119898119899119905 isin m119899 119905 =1 2 119879 are the subscripts of the objects contained in thegene-network

The traditional clustering approach such as K-meansminimizes the sum of differences between individual gene-network in each cluster and the corresponding cluster centresthat is

min119901119899119896c119896

119899119899119911(X)sum119899=1

1003817100381710038171003817100381710038171003817100381710038171003817m119899 minus 119870sum119896=1

119901119899119896c11989610038171003817100381710038171003817100381710038171003817100381710038172

119865

(25)

where 119901119899119896 is the probability of gene-network m119899 belongingto the 119896th cluster Also we can rewrite the problem by anew perspective of clustering individual object in the gene-network as follows

min119901119899119905119896119888119896119905

119899119899119911(X)sum119899=1

119879sum119905=1

1003817100381710038171003817100381710038171003817100381710038171003817119898119899119905 minus119870sum119896=1

119901119899119905 11989611988811989611990510038171003817100381710038171003817100381710038171003817100381710038172

119865

(26)

where 119901119899119905 119896 is the probability of object V119905119899 belonging to the 119896thcluster

In the matrix form K-means can be formalized as

minPC

M minus PC2119865 (27)

where P is the cluster indication matrix and C is the clustercentres

By matricization of X along the 119905th mode the CPdecomposition ofX in (3) can be rewritten as

minU(1) U(2) U(119879)

100381710038171003817100381710038171003817X(119905) minus U(119905) (⊙(119905)U)⊤1003817100381710038171003817100381710038172119865 (28)

X(119905) is thematricization ofX along the 119905thmode andM is thesparse representation ofX LetU(119905) = P and let (⊙(119905)U)⊤ = CThat is U(119905) is the cluster indication matrix for the 119905th typeof objects and (⊙(119905)U)⊤ is the cluster centres So the CPdecomposition in (3) is equivalent to the K-means clusteringfor the 119905th type of objects in heterogeneous informationnetwork 119866 = (119881 119864)

By matricization of X in (3) along different modes wecan prove that the CP decomposition is equivalent to theK-means clustering for other types of objects So the CPdecomposition of X obtains the clustering of multitypedobjects in the heterogeneous information network119866 = (119881 119864)simultaneously

It is worth noting that the CP decomposition basedclustering is a soft clustering method the factor matricesindicate the probability of objects belonging to correspondingclusters The soft clustering is more in line with reality

because many objects in the heterogeneous informationnetworks may belong to several clusters In other wordsthe clusters are overlapping In some cases the overlappingclusters need to be translated into nonoverlapping clusterswhich can be achieved by using different approaches such asK-means to cluster the rows of factor matrices Usually thenonoverlapping clusters can be obtained by simply assigningeach object to the cluster which has the largest entry in thecorresponding row of factor matrix

45 Time Complexity Analysis Themain time consumptionof updating each factor matrix U(119905) in SGDClus is com-puting 120597L120597U(119905) According to (17) we need to calculateX(119905)(⊙(119905)U) and U(119905)Γ(119905) respectively Since U(119905) isin R119873119905times119870

and X(119905) isin R119873119905timesprod

119879

119894=1119894 =119905

119873119894

we have X(119905)(⊙(119905)U) isin R119873119905times119870Γ(119905) isin R119870times119870 and U(119905)Γ(119905)isinR119873119905times119870 Firstly if we successively calculate the Khatri-Rao prod-

uct of 119879 minus 1 matrices and a matrix-matrix multiplicationwhen computing X(119905)(⊙(119905)U) the intermediate results willbe of very large size and the computational cost will bevery expensive In practice we can reduce the complexityby ignoring the unnecessary calculation Let us observe theelement ofX(119905)(⊙(119905)U) that is

(X(119905) (⊙(119905)U))119899119905 119896

= sum119899119894119879

119894=1119894 =119905

(119909119899119905 prod119879119894=1119894 =119905

119899119894

119879prod119894=1119894 =119905

119906(119894)119899119894 119896) (29)

119909119899119905 prod119879119894=1119894 =119905

119899119894is an element in the matricization ofX which rep-

resents a corresponding gene-network in the heterogeneousinformation networkWhen 119909119899119905 prod119879119894=1

119894 =119905

119899119894= 0 we can ignore the

following Khatri-Rao product Hence only nonzero elementsinX need to be computedTherefore the time complexity forcomputingX(119905)(⊙(119905)U) is 119874(119899119899119911(X)119873119905119870)

Secondly the element of U(119905)Γ(119905) is given by

(U(119905)Γ(119905))119899119905 119896

= 119870sum119895=1

(119906(119905)119899119905 119895 119879prod119894=1119894 =119905

119873119894sum119899119894=1

119906(119894)119899119894 119895119906(119894)119899119894 119896) (30)

So the time complexity of computing U(119905)Γ(119905) is 119874((119873 minus119873119905)1198702) where 119873 = sum119879119905=1119873119905 is the total number of objectsin the networks

Above all the time complexity of each iteration inSGDClus is 119874(119899119899119911(X)119873119870 + (119879 minus 1)1198731198702) Note that in thereal-world heterogeneous information networks the numberof clusters119870 and the number of object types 119879 are usually farless than119873 that is119870 ≪ 119873 and 119879 ≪ 119873

According to (22) the time complexity of updatingeach factor matrix U(119905) in SOSClus is composed of threecomponents X(119905)(⊙(119905)U) (Γ(119905) + 120582I)minus1 and the product ofthem Compared to SGDClus only the time consumption ofcomputing an inverse matrix for (Γ(119905) + 120582I) is additional in

8 Scientific Programming

SOSClus Since (Γ(119905)+120582I) is a119870times119870 squarematrix computingthe inverse of such matrix costs 119874(1198703) Nevertheless 119874(1198703)is usually negligible because119870 ≪ 119873

5 Experiments and Results

In this section we present several experiments on syntheticand real-world datasets for heterogeneous information net-works and compare the performance with a number of state-of-the-art clustering methods

51 Evaluation Metrics and Experimental Setting

511 Evaluation Metrics In the experiments we adopt theNormalized Mutual Information (NMI) [48] and Accuracy(AC) as our performance measurements

NMI is used to measure the mutual dependence infor-mation between the clustering result and the ground truthGiven 119873 objects 119870 clusters one clustering result andthe ground truth classes for the objects let 119899(119894 119895) 119894 119895 =1 2 119870 be the number of objects that labeled 119894 in clus-tering result but labeled 119895 in the ground truth The jointdistribution can be defined as119901(119894 119895) = 119899(119894 119895)119873 themarginaldistribution of rows can be calculated as 1199011(119895) = sum119870119894=1 119901(119894 119895)and themarginal distribution of columns can be calculated as1199012(119894) = sum119870119895=1 119901(119894 119895) Then the NMI is defined as

NMI = sum119870119894=1sum119870119895=1 119901 (119894 119895) log (119901 (119894 119895) 1199011 (119895) 1199012 (119894))radicsum119870119895=1 1199011 (119895) log1199011 (119895)sum119870119894=1 1199012 (119894) log1199012 (119894) (31)

The NMI ranges from 0 to 1 the larger value of NMI thebetter the clustering result

AC is used to compute the clustering accuracy thatmeasures the percent of the correct clustering result AC isdefined as

AC = sum119879119905=1sum119873119905119899=1 120575 (map (120592119905119899) label (120592119905119899))sum119879119905=1119873119905 (32)

where map(120592119905119899) is the cluster label of the object 120592119905119899 and thelabel(120592119905119899) is the ground truth class of the object 120592119905119899 And 120575(sdot)is an indicator function

120575 (sdot) = 1 if map (120592119905119899) = label (120592119905119899) 0 if map (120592119905119899) = label (120592119905119899) (33)

Since both ofNMI andACare used tomeasure the perfor-mance of clustering one type of object the weighted averageNMI and AC are also used to measure the performance ofSTFClus and other state-of-the-art methods

NMI = sum119879119905=1119873119905 (NMI)119905sum119879119905=1119873119905 AC = sum119879119905=1119873119905 (AC)119905sum119879119905=1119873119905 (34)

Table 1 The synthetic datasets

Syntheticdatasets 119879 119870 119878 119863Syn1 2 2 1119872 = 1000 times 1000 01Syn2 2 4 10119872 = 1000 times 10000 001Syn3 4 2 100119872 = 100 times 100 times 100 times 100 01Syn4 4 4 1000119872 = 100 times 100 times 100 times 1000 001119879 is the number of object types in the heterogeneous information networkand also the number of modes in the tensor119870 is the number of clusters 119878 isthe network scale and 119878 = 1198731 times1198732 times sdot sdot sdot times119873119879119863 is the density of the tensorthat is the percentage of nonzero elements in the tensor and119863 = 119899119899119911(X)119878

512 Experimental Setting In order to compare the perfor-mance of our proposed SGDClus and SOSClus with othersimpartially all methods share a common stopping conditionthat is 1003816100381610038161003816Liter minusLiterminus1

1003816100381610038161003816Literminus1

le 10minus6 (35)

or iter reaches the maximum iterations Liter and Literminus1are the values of function L at the current that is (iter)thiteration and the previous that is (iter minus 1)th iterationrespectively And we set the maximum iterations to be 1000Throughout the experiments the regularization parameter 120582in SGDClus and SOSClus is fixed as 120582 = 0001

All experiments are implemented in theMATLABR2015a(version 850) 64-bit And the MATLAB Tensor Toolbox(version 26 httpwwwsandiagovsimtgkoldaTensorTool-box) is used in our experiments Since the heterogeneousinformation networks are often sparse in real-world scenar-ios that is most elements in the tensor X are zeros we usethe sparse format of X as proposed in [47] which has beensupported by MATLAB Tensor Toolbox The experimentalresults are the average values obtained by running thealgorithms ten times on corresponding datasets

52 Experiments on Synthetic Datasets

521 The Synthetic Datasets Description The purposeof using synthetic datasets is to examine whether theproposed tensor CP decomposition clustering frameworkcan work well since the detailed cluster structures of thesynthetic datasets are known In order to make the syntheticdatasets similar to a realistic situation we assume thatthe distribution for different types of objects that appearin a gene-network follows Zipf rsquos law (see details onlinehttpsenwikipediaorgwikiZipf rsquos_law) Zipf rsquos law isdefined by119891(119903 120588119873) = 119903minus120588sum119873119899=1 119899minus120588 where119873 is the numberof objects 119903 is the object index and 120588 is the parameter charac-terizing the distribution Zipf rsquos law denotes the frequency ofthe 119903th object appearing in the gene-networkWe set 120588 = 095and generate 4 synthetic datasets with different parametersThe details of these synthetic datasets are shown in Table 1

522 Experimental Results In the beginning of the experi-ments we set a common learning rate 120578 = 1(iter + 1) for

Scientific Programming 9

SGDClus on synthetic datasets

0 10 20 30 40 50

Iteration

Erro

r

Syn1Syn2

Syn3Syn4

000

005

010

015

020

025

030

035

SOSClus on synthetic datasets

0 10 20 30 40 50

Iteration

Syn1Syn2

Syn3Syn4

00

01

02

03

04

05

06

07

08

Erro

r

Figure 3 Convergence speed of SGDClus and SOSClus on the 4 synthetic datasets with a common learning rate that is 120578 = 1(iter + 1)SGDClus and SOSClus We find that SOSClus has a fasterconvergence speed and better robustness with respect to thelearning rate which is clearly shown in Figure 3 AlthoughSGDClus may eventually converge to a local minimum theefficiency of the optimization near a local minimum is not allroses As shown in Figure 3 the solutions of SGDClus swingaround a local minimum This phenomenon proves that theconvergence speed of SGDClus is sensitive to the choice oflearning rate 120578

Then we modify the learning rate to 120578 = 1(iter +119888) for SGDClus where 119888 is a constant optimized in theexperiments In practice 119888 = 27855 for SGDClus runningon Syn3 and 119888 = 430245 for SGDClus running on Syn4The performance comparison of SGDClus with learning rate120578 = 1(iter + 119888) and SOSClus with learning rate 120578 = 1(iter +1) on Syn3 and Syn4 is shown in Figure 4 By employingthe optimized learning rate SGDClus converges to a localminimumquicklyHowever compared to SOSClus SGDClusstill has no advantageThe hand drawing blue circles over thecurve of SOSClus in Figure 4 shows that SOSClus can escapefrom a local minimum and find the global minimum whileSGDClus just obtains the first reaching local minimum

According to (23) and (24) we accessorily obtain the solu-tions of OPT by running SOSClus So we compare the ACand NMI of OPT SGDClus and SOSClus on the 4 syntheticdatasets which are shown in Figure 5 With the increase ofobject types in the heterogeneous information networks theAC and NMI of SOSClus and OPT increase distinctly whileperformance of SGDClus almost does not changeWhen 119879 =2 AC and NMI of these three methods on Syn1 and Syn2are almost equal and low However AC and NMI of SOSClusincrease to 1 when 119879 = 4 Since the histograms of OPT SGD-Clus and SOSClus on Syn1 and Syn2 are almost the same andon Syn3 and Syn4 are also similar we know that the param-eters 119870 and 119878 have no significant effect on the performanceGenerally the larger density119863 and the number of object types119879 in the network result in higher AC and NMI of SOSClus

Obviously in the experiments on the 4 synthetic datasetsSOSClus shows an excellent performance SOSClus has afaster convergence speed and better robustness with respectto the learning rate Meanwhile SOSClus performs better onAC and NMI because it can escape from a local minimumand find the global minimum

53 Performance Comparison on Real-World Dataset

531 Real-World Dataset Description The experiments onthe real-world dataset are used to compare the performanceof the tensor CP decomposition clustering framework withother state-of-the-art methods

The real-world dataset extracted from the DBLP databaseis the DBLP-four-area dataset which can be downloadedfrom httpwebcsuclaedusimyzsundataDBLP_four_areazip It is a four research-area subset of DBLP and is usedin [2 3 12 13 15 16 18] The four research areas in DBLP-four-area dataset are database (DB) data mining (DM)machine learning (ML) and information retrieval (IR)respectively There are five representative conferences ineach area And all related authors papers published in theseconferences and terms contained in these papersrsquo titlesare included The DBLP-four-area dataset contains 14376papers with 100 labeled 14475 authors with 4057 labeled20 labeled conferences and 8920 terms The density of theDBLP-four-area dataset is 901935 times 10minus9 so we construct a4-mode tensor with size of 14376 times 14475 times 20 times 8920 and334832 nonzero elements We compare the performance oftensor CP decomposition clustering framework with severalother methods on the labeled record in this dataset

532 Comparative Methods

(i) NetClus (see [2]) An extended version of RankClus [1]which can deal with the network follows the star networkschema The time complexity of NetClus for clustering each

10 Scientific Programming

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

08

Erro

rExperiments on Syn3

SGDClusSOSClus

Escaped from a local minimum

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

Erro

r

Experiments on Syn4

SGDClusSOSClus

Escaped from a local minimum

Figure 4 Convergence speed of SGDClus and SOSClus on Syn3 and Syn4 with different learning rate The learning rate for SOSClus is120578 = 1(iter + 1) while that for SGDClus is 120578 = 1(iter + 119888) where 119888 is a constant optimized in the experiments

OPTSGDClusSOSClus

Synthetic datasets

00

02

04

06

08

10

AC

Syn1 Syn2 Syn4Syn3

OPTSGDClusSOSClus

NM

I

00

02

04

06

08

10

Synthetic datasetsSyn1 Syn2 Syn4Syn3

Figure 5 The AC and NMI of OPT SGDClus and SOSClus on the 4 synthetic datasets

object type in each iteration is 119874(119870|119864| + (1198702 + 119870)119873) where119870 is the number of clusters |119864| is the number of edges in thenetwork and119873 is the total number of objects in the network

(ii) PathSelClus (see [15 16]) A clustering method based onthe predefined metapath requires a user guide In PathSel-Clus the distance between the same type objects is measuredby PathSim [3] and themethod starts with the given seeds byuser The time complexity of PathSelClus for clustering eachobject type in each iteration is 119874((119870 + 1)|P| + 119870119873) where|P| is the number of metapath instances in the networkAnd the time complexity of PathSim used by PathSelClus forclustering each object type is 119874(119873119889) where 119889 is the averagedegree of objects

(iii) FctClus (see [13]) It is a recently proposed clusteringmethod for heterogeneous information networks As withNetClus the FctClus method can deal with networks follow-ing the star network schema The time complexity of FctClusfor clustering each object type in each iteration is 119874(119870|119864| +119879119870119873)533 Experimental Results As the baseline methods canonly deal with specific schema heterogeneous informationnetworks here we must construct different subnetworks forthem For NetClus and FctClus the network is organized as astar network schema like in [2 13] where the paper (P) is thecentre type and author (A) conference (C) and term (T) are

Scientific Programming 11

Table 2 AC of experiments on DBLP-four-area dataset

AC OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 05882 08476 09007 07154 07551 07887Author 05872 08486 09486 07177 07951 08008Conference 1 099 1 09172 09950 09031avg (AC) 05892 08493 09477 07186 07951 08010

Table 3 NMI of experiments on DBLP-four-area dataset

NMI OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 06557 06720 08812 05402 06142 07152Author 06539 08872 08822 05488 06770 06012Conference 1 08497 1 08858 09906 08248avg (NMI) 06556 08778 08827 05503 06770 06050

Table 4 Running time of experiments on DBLP-four-area dataset

Running time (s) OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper mdash mdash mdash 8026 5423 8084Author mdash mdash mdash 7437 6811 7749Conference mdash mdash mdash 6584 6293 6698Total time 6726 4324 8184 22047 18527 22531

the attribute types For PathSelClus we select the metapathof P-T-P A-P-C-P-A and C-P-T-P-C to cluster the papersauthors and conferences respectively And in PathSelCluswe give each cluster one seed to start

Wemodel theDBLP-four-area dataset as a 4-mode tensorwhere each mode represents one object type The 4 modesare author (A) paper (P) conference (C) and term (T)respectively Actually the sequence of the object types isinsignificant And each element of the tensor represents agene-network in the heterogeneous information network Inthe experiments we set the learning rate for SOSClus to be120578 = 1(iter+1) and an optimized learning rate 120578 = 1(iter+119888)with 119888 = 1000125 for SGDClus By running SOSClus weaccessorily obtain the solutions of OPT So we compare theexperimental results of OPT SGDClus and SOSClus on theDBLP-four-area dataset with the three baseline methods Seethe details in Tables 2 3 and 4

In Tables 2 and 3 SOSClus performs best on average ACand NMI and SGDClus takes the second place All methodsachieve satisfactory AC and NMI on the conference sincethere are only 20 conferences in the network SGDClus takesthe shortest running time and OPT and SOSClus have anobvious advantage on running time compared with otherbaselinesThe time complexity of SGDClus is119874(119899119899119911(X)119873119870+(119879 minus 1)1198702119873) and the time complexity of SOSClus is119874(119899119899119911(X)119873119870+(119879minus1)1198702119873+1198703) where 119899119899119911(X) is the num-ber of nonzero elements in X that is the number of gene-networks in the heterogeneous information network Wehave 119899119899z(X) lt |P| ≪ |119864| 119870 ≪ 119873 and 119879 ≪ 119873 Comparedwith the time complexity of the three baselines SGDClus andSOSClus have a little disadvantage However it is worth not-ing that the three baselines can only cluster one type of objectsin the network in each running while OPT SGDClus and

SOSClus can obtain the clusters of all types of objects simulta-neously by running onceThis is the reasonwhy only the totaltime is shown for OPT SGDClus and SOSClus in Table 4Moreover the total time of OPT SGDClus and SOSClusis on the same order of magnitude as the running time ofother baselines for clustering each type of objects which isconsistent with the comparison of the time complexity

6 Conclusion

In this paper a tensor CP decomposition method for clus-tering heterogeneous information networks is presented Intensor CP decomposition clustering framework each type ofobjects in heterogeneous information network is modeled asone mode of tensor and the gene-networks in the networkare modeled as the elements in tensor In other wordstensor CP decomposition clustering framework can modeldifferent types of objects and semantic relations in the het-erogeneous information network without the restriction ofnetwork schema In addition two stochastic gradient descentalgorithms named SGDClus and SOSClus are designedSGDClus and SOSClus can cluster all types of objects andthe gene-networks simultaneously by running once Theproposed algorithms outperformed other state-of-the-artclustering methods in terms of AC NMI and running time

Notations119886 A scalar (lowercase letter)a A vector (boldface lowercase letter)A A matrix (boldface capital letter)X A tensor (calligraphic letter)

12 Scientific Programming

1199091198941 1198942119894119873 The (1198941 1198942 119894119873)th element of an 119873th-order tensorX

X(119899) Matricization ofX along the 119899th modelowast Hadamard product (element-wise prod-uct) of two tensors (or matrices or vectors)with the same dimension⊙ Khatri-Rao product of two matrices sdot 119865 Frobenius norm of a tensor (or matrices orvectors)

a ∘ b Outer product of two vectors

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This study was supported by the National Natural ScienceFoundation of China (no 61401482 and no 61401483)

References

[1] Y Sunt J Hant P Zhao Z Yin H Cheng and T WuldquoRankClus integrating clustering with ranking for heteroge-neous information network analysisrdquo in Proceedings of the 12thInternational Conference on Extending Database TechnologyAdvances inDatabase Technology (EDBT rsquo09) pp 565ndash576 SaintPetersburg Russia March 2009

[2] Y Sun Y Yu and J Han ldquoRanking-based clustering of hetero-geneous information networks with star network schemardquo inProceedings of the 15th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo09) pp 797ndash805 Paris France July 2009

[3] Y Sun J Han X Yan P S Yu and TWu ldquoPathSim meta path-based top-K similarity search in heterogeneous informationnetworksrdquo PVLDB vol 4 no 11 pp 992ndash1003 2011

[4] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability Volume1 Statistics pp 281ndash297 University of California Press BerkeleyCalif USA 1967

[5] S Guha R Rastogi and K Shim ldquoCure an efficient clusteringalgorithm for large databasesrdquo Information Systems vol 26 no1 pp 35ndash58 2001

[6] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the 2nd International Conference onKnowledge Discovery and Data Mining (KDD-96) E SimoudisJ Han and U M Fayyad Eds pp 226ndash231 AAAI PressPortland Ore USA 1996

[7] W Wang J Yang and R R Muntz ldquoSTING a statisticalinformation grid approach to spatial data miningrdquo in Pro-ceedings of the 23rd International Conference on Very LargeData Bases (VLDB rsquo97) M Jarke M J Carey K R DittrichF H Lochovsky P Loucopoulos and M A Jeusfeld Edspp 186ndash195 Morgan Kaufmann Athens Greece August 1997httpwwwvldborgconf1997P186PDF

[8] E H Ruspini ldquoNew experimental results in fuzzy clusteringrdquoInformation Sciences vol 6 no 73 pp 273ndash284 1973

[9] U von Luxburg ldquoA tutorial on spectral clusteringrdquo Statistics andComputing vol 17 no 4 pp 395ndash416 2007

[10] C Shi Y Li J Zhang Y Sun and P S Yu ldquoA survey of hetero-geneous information network analysisrdquo IEEE Transactions onKnowledge and Data Engineering vol 29 no 1 pp 17ndash37 2017

[11] J Han Y Sun X Yan and P S Yu ldquoMining knowledge fromdata an information network analysis approachrdquo in Proceedingsof the IEEE 28th International Conference on Data Engineering(ICDE rsquo12) pp 1214ndash1217 IEEE Washington DC USA April2012

[12] J Chen W Dai Y Sun and J Dy ldquoClustering and rankingin heterogeneous information networks via gamma-poissonmodelrdquo in Proceedings of the SIAM International Conference onData Mining (SDM rsquo15) pp 424ndash432 May 2015

[13] J Yang L Chen and J Zhang ldquoFctClus a fast clustering algo-rithm for heterogeneous information networksrdquoPLoSONE vol10 no 6 Article ID e0130086 2015

[14] C Shi R Wang Y Li P S Yu and B Wu ldquoRanking-basedclustering on general heterogeneous information networks bynetwork projectionrdquo in Proceedings of the 23rd ACM Interna-tional Conference on Conference on Information and KnowledgeManagement (CIKM rsquo14) pp 699ndash708 ACM Shanghai ChinaNovember 2014

[15] Y Sun B Norick J Han X Yan P S Yu and X Yu ldquoInte-grating meta-path selection with user-guided object clusteringin heterogeneous information networksrdquo in Proceedings of the18th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo12) pp 1348ndash1356 ACMBeijing China August 2012

[16] Y Sun B Norick J Han X Yan P S Yu and X YuldquoPathSelClus integrating meta-path selection with user-guidedObject clustering in heterogeneous information networksrdquoACMTransactions onKnowledgeDiscovery fromData vol 7 no3 pp 723ndash724 2013

[17] X Yu Y Sun B Norick T Mao and J Han ldquoUser guided entitysimilarity search using meta-path selection in heterogeneousinformation networksrdquo in Proceedings of the 21st ACM Interna-tional Conference on Information and Knowledge Management(CIKM rsquo12) pp 2025ndash2029 ACM November 2012

[18] Y Sun C C Aggarwal and J Han ldquoRelation strength-awareclustering of heterogeneous information networks with incom-plete attributesrdquoProceedings of the VLDBEndowment vol 5 no5 pp 394ndash405 2012

[19] M Zhang H Hu Z He andWWang ldquoTop-k similarity searchin heterogeneous information networks with x-star networkschemardquo Expert Systems with Applications vol 42 no 2 pp699ndash712 2015

[20] Y Zhou and L Liu ldquoSocial influence based clustering ofheterogeneous information networksrdquo in Proceedings of the19th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 338ndash346 ACM Chicago IllUSA August 2013

[21] L R Tucker ldquoSome mathematical notes on three-mode factoranalysisrdquo Psychometrika vol 31 pp 279ndash311 1966

[22] R A Harshman ldquoFoundations of the parafac procedure modeland conditions for an lsquoexplanatoryrsquo multi-mode factor analysisrdquoUCLAWorking Papers in Phonetics 1969

[23] H A L Kiers ldquoTowards a standardized notation and terminol-ogy in multiway analysisrdquo Journal of Chemometrics vol 14 no3 pp 105ndash122 2000

[24] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

Scientific Programming 13

[25] A Cichocki D Mandic L De Lathauwer et al ldquoTensordecompositions for signal processing applications from two-way to multiway component analysisrdquo IEEE Signal ProcessingMagazine vol 32 no 2 pp 145ndash163 2015

[26] W Peng and T Li ldquoTensor clustering via adaptive subspaceiterationrdquo Intelligent Data Analysis vol 15 no 5 pp 695ndash7132011

[27] J Hastad ldquoTensor rank is NP-completerdquo Journal of AlgorithmsCognition Informatics and Logic vol 11 no 4 pp 644ndash6541990

[28] S Metzler and P Miettinen ldquoClustering Boolean tensorsrdquo DataMining amp Knowledge Discovery vol 29 no 5 pp 1343ndash13732015

[29] X Cao X Wei Y Han and D Lin ldquoRobust face clustering viatensor decompositionrdquo IEEE Transactions on Cybernetics vol45 no 11 pp 2546ndash2557 2015

[30] I Sutskever R Salakhutdinov and J B Tenenbaum ldquoModellingrelational data using Bayesian clustered tensor factorizationrdquoin Proceedings of the 23rd Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo09) pp 1821ndash1828 BritishColumbia Canada December 2009

[31] B Ermis E Acar and A T Cemgil ldquoLink prediction in het-erogeneous data via generalized coupled tensor factorizationrdquoDataMining amp Knowledge Discovery vol 29 no 1 pp 203ndash2362015

[32] A R Benson D F Gleiche and J Leskovec ldquoTensor spectralclustering for partitioning higher-order network structuresrdquoin Proceedings of the SIAM International Conference on DataMining (SDM rsquo15) pp 118ndash126 Vancouver Canada May 2015

[33] L Xiong X Chen T-K Huang J G Schneider and JG Carbonell ldquoTemporal collaborative filtering with Bayesianprobabilistic tensor factorizationrdquo in Proceedings of the 10thSIAM International Conference on Data Mining (SDM rsquo10) pp211ndash222 Columbus Ohio USA May 2010

[34] E E Papalexakis L Akoglu and D Ience ldquoDo more views ofa graph help Community detection and clustering in multi-graphsrdquo in Proceedings of the 16th International Conferenceof Information Fusion (FUSION rsquo13) pp 899ndash905 IstanbulTurkey July 2013

[35] M Vandecappelle M Bousse F V Eeghem and L D Lath-auwer ldquoTensor decompositions for graph clusteringrdquo Inter-nal Report 16-170 ESAT-STADIUS KU Leuven LeuvenBelgium 2016 ftpftpesatkuleuvenbepubSISTAsistakul-akreports2016_Tensor_Graph_Clusteringpdf

[36] W Shao L He and P S Yu ldquoClustering on multi-sourceincomplete data via tensor modeling and factorizationrdquo inProceedings of the 19th Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD rsquo15) pp 485ndash497 Ho ChiMinh City Vietnam 2015

[37] J D Carroll and J-J Chang ldquoAnalysis of individual differencesin multidimensional scaling via an n-way generalization ofldquoEckart-Youngrdquo decompositionrdquo Psychometrika vol 35 no 3pp 283ndash319 1970

[38] L De Lathauwer B De Moor and J Vandewalle ldquoOn thebest rank-1 and rank-(R1 R2 RN) approximation ofhigher-order tensorsrdquo SIAM Journal on Matrix Analysis andApplications vol 21 no 4 pp 1324ndash1342 2000

[39] E Acar D M Dunlavy and T G Kolda ldquoA scalable opti-mization approach for fitting canonical tensor decompositionsrdquoJournal of Chemometrics vol 25 no 2 pp 67ndash86 2011

[40] S Hansen T Plantenga and T G Kolda ldquoNewton-basedoptimization for Kullback-Leibler nonnegative tensor factor-izationsrdquo Optimization Methods amp Software vol 30 no 5 pp1002ndash1029 2015

[41] NVervliet and LDe Lathauwer ldquoA randomized block samplingapproach to canonical polyadic decomposition of large-scaletensorsrdquo IEEE Journal on SelectedTopics in Signal Processing vol10 no 2 pp 284ndash295 2016

[42] R Ge F Huang C Jin and Y Yuan ldquoEscaping from saddlepointsmdashonline stochastic gradient for tensor decompositionrdquohttparxivorgabs150302101

[43] T G Kolda ldquoMultilinear operators for higher-order decompo-sitionsrdquo Tech Rep SAND2006-2081 Sandia National Labora-tories 2006 httpwwwostigovscitechbiblio923081=0pt

[44] P Paatero ldquoA weighted non-negative least squares algorithmfor three-way lsquoPARAFACrsquo factor analysisrdquo Chemometrics ampIntelligent Laboratory Systems vol 38 no 2 pp 223ndash242 1997

[45] P Paatero ldquoConstruction and analysis of degenerate PARAFACmodelsrdquo Journal of Chemometrics vol 14 no 3 pp 285ndash2992000

[46] B L Bottou and N Murata ldquoStochastic approximations andefficient learningrdquo inTheHandbook of BrainTheory and NeuralNetworks 2nd edition 2002

[47] B W Bader and T G Kolda ldquoEfficient MATLAB computationswith sparse and factored tensorsrdquo SIAM Journal on ScientificComputing vol 30 no 1 pp 205ndash231 2007

[48] A Strehl and J Ghosh ldquoCluster ensemblesmdasha knowledgereuse framework for combining multiple partitionsrdquo Journal ofMachine Learning Research vol 3 no 3 pp 583ndash617 2003

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: A Tensor CP Decomposition Method for Clustering ...downloads.hindawi.com/journals/sp/2017/2803091.pdf · spectral clustering, are designed to analyze homogeneous informationnetworks

Scientific Programming 3

to model the social influence and thenmeasure the similaritybetween objects

22 Tensor Factorization A tensor is a multidimensionalarray in which the elements are addressed by more thantwo indices Tensor factorization has been studied since theearly 20th century [21ndash25] Two of the most popular tensorfactorization methods are Tucker decomposition [21 24] andcanonical decomposition using parallel factors (CANDE-COMPPARAFAC) [23 24] The CANDECOMPPARAFACis also named CP decomposition It is worth noting that CPdecomposition is a special case of Tucker decomposition

The clustering issues based on tensor factorization areoften modeled as the optimization problems [26] But it hasbeen proven in [27] that tensor clustering formulations areNP-hard In the past years many approximation algorithmsfor tensor clustering are proposed [28ndash30] These theoriesprovide a new perspective for us as to clustering heteroge-neous information networks Tensor factorization based clus-tering has also been used in some specific applications Exam-ples include link prediction in higher-order network struc-tures [31 32] collaborative filtering in recommendation sys-tems [33] community detection in multigraphs [34] graphclustering [35] and modeling multisource datasets [36]

The Alternating Least Squares (ALS) algorithm [22 3738] is one of themost famous and commonly used algorithmsto solve the tensor factorization which updates one compo-nent iteratively at each round while holding the others con-stant However ALS suffers from some limitations for exam-ple ALS may converge to a local minimum and the memoryconsumption may explode when the scale of tensor is largeNonlinear optimization approach is another option to obtainthe tensor factorization such as nonlinear conjugate gradientmethod [39] Newton based optimization [40] randomizedblock sampling method [41] and stochastic gradient descent[42] In this paper we adopt a stochastic gradient descentalgorithmwith Tikhonov regularization item loss function toprocess the tensor CP decomposition based clustering

3 Preliminaries

First we introduce some related concepts and notations oftensors used in this paper More details about tensor algebracan be found in [24 43] The order of a tensor is the numberof dimensions also known as ways or modes We will followthe convention used in [23] to denote scalars by lowercaseletters for example 119886 119887 119888 vectors (one mode) by boldfacelowercase letters for example a b c matrices (two modes)by boldface capital letters for example ABC and tensors(three modes or more) by boldface calligraphic letters forexample XYZ Elements of a matrix or a tensor aredenoted by lowercase letters with subscripts that is the(1198941 1198942 119894119873)th element of an119873th-order tensorX is denotedby 1199091198941 1198942 119894119873 The notations about tensor algebra used in thispaper are summarized in Notations

If an119873th-order tensorX isin R1198681times1198682timessdotsdotsdottimes119868119873 can be written asthe outer product of119873 vectors that isX = a(1) ∘a(2) ∘sdot sdot sdot∘a(119873)and a(119899) isin R119868119899 119899 = 1 2 119873 tensor X is named rank-onetensor The CP decomposition represents a tensor as a sum

of 119877 rank-one tensors that is the CP decomposition ofX isX = sum119877119903=1 a(1)119903 ∘ a(2)119903 ∘ sdot sdot sdot ∘ a(119873)119903 where 119877 is a positive integerand a(119899)119903 isin R119868119899 119903 = 1 2 119877 119899 = 1 2 119873 The rank of atensor is defined as the smallest number of rank-one tensorsfor which the equality holds in the CP decomposition anddenoted as rank(X) = min119877 In fact the problem of tensorrank determination is NP-hard [27]

Let factor matrices A(119899) = [a(119899)1 a(119899)2 a(119899)119877 ] isin R119868119899times119877for 119899 = 1 2 119873 We denote ⟦A(1)A(2) A(119873)⟧ equivsum119877119903=1 a(1)119903 ∘ a(2)119903 ∘ sdot sdot sdot ∘ a(119873)119903 By minimizing the Frobeniusnorm of the difference betweenX and its CP approximationthe CP decomposition can be formulated as an optimizationproblem

min 12 10038171003817100381710038171003817X minus ⟦A(1)A(2) A(119873)⟧100381710038171003817100381710038172119865 (1)

Then we give the definitions for the information networkand the network schema which are based on the work by Sunet al [3]

Definition 1 (information network [3]) An information net-work is a graph 119866 = (119881 119864) defined on a set of objects 119881 and aset of links 119864 where119881 belongs to 119879 objects types V = V119905119879119905=1and 119864 belongs to 119871 links types E = R119897119871119897=1

Specifically when 119879 gt 1 or 119871 gt 1 the information net-work is called heterogeneous information network otherwiseit is called homogeneous information network

We denote the object set of type V119905 as V119905119899119873119905119899=1 where 119873119905is the number of objects in type V119905 that is 119873119905 = |V119905| and119905 = 1 2 119879 The total number of objects in the network isgiven by119873 = sum119879119905=1119873119905Definition 2 (network schema [3]) The network schema is ametatemplate for a heterogeneous information network 119866 =(119881 119864) which is a graph defined over object types V and linkstypes E denoted by 119878119866 = V E

A network schema 119878119866 = V E shows how many types ofobjects are there in the network119866 = (119881 119864) andwhich type thelinks between different object types belong to Figure 1 showsthe network schema of DBLP which follows a star networkschema

4 Tensor CP Decomposition BasedClustering Framework

41 Tensor Construction According to Definition 2 we knowthat the network schema 119878119866 = V E is a metatemplate forthe given heterogeneous information network 119866 = (119881 119864) Inother words 119866 = (119881 119864) is an instance of 119878119866 = V E So wecan find at least one subnetwork of 119866 = (119881 119864) which followsthe schema 119878119866 = V EDefinition 3 (gene-network) A gene-network denoted by1198661015840 = (1198811015840 1198641015840) is the minimum subnetwork of 119866 = (119881 119864)which follows the schema 119878119866 = V E

It is easy to see that a gene-network is one of thesmallest instances of 119878119866 = V E in the set of subnet-works of 119866 = (119881 119864) For example a gene-network in

4 Scientific Programming

+

21

3

4

1

4

2

3

Cluster 1

1 2 3 412

34

12

34

5

Cluster 2 Cluster K

1

2

5

4

3 + middot middot middot +

Figure 2 An illustration of tensor CP decomposition method for clustering heterogeneous information network

DBLP network (in Figure 1 which contains four types ofobjects 119860 119875 119881 119879) denoted by 1198661015840 = (V119860119894 V119875119895 V119881119898 V119879119899 ⟨V119860119894 V119875119895 ⟩ ⟨V119875119895 V119881119898⟩⟨V119875119895 V119879119899 ⟩) represents a semantic relation ofldquoan Author V119860119894 writes a Paper V119875119895 published in the Venue V119881119898and containing the Term V119879119899 rdquo For simplicity we can use thesubscript of each object in 1198661015840 to mark the correspondinggene-network In the example the gene-network 1198661015840 can bemarked by 1198661015840119894119895119898119899

LetX be a119879th-order tensor of size1198731times1198732timessdot sdot sdottimes119873119879 eachmode of X represents one type of objects in the network 119866An arbitrary element 11990911989911198992sdotsdotsdot119899119879 isin 0 1 for 119899119905 = 1 2 119873119905is an indicator of whether the corresponding gene-network11986610158401198991 1198992 119899119879 exists that is

11990911989911198992sdotsdotsdot119899119879 = 1 if exist11986610158401198991 1198992119899119879 0 otherwise

(2)

Then the heterogeneous information network119866 = (119881 119864)can be represented by the form of tensor asX

42 Problem Formulation Using the tensor representationX of 119866 = (119881 119864) we can partition the multityped objectsinto different clusters by the CP decomposition We assumethat there are 119870 clusters in 119866 = (119881 119864) and denote U(119905) isinR119873119905times119870 119905 = 1 2 119879 as the cluster indication matrix of the119905th type of objects Then the CP decomposition ofX is

min 10038171003817100381710038171003817X minus ⟦U(1)U(2) U(119879)⟧100381710038171003817100381710038172119865 (3)

Each row u(119905)119896 = [119906(119905)1198961 119906(119905)1198962 119906(119905)119896119873119905]⊤ of the factor matrixU(119905) is the probability vector for each object from 119905th typebelonging to the 119896th cluster In other words the 119896th clusteris composed of the 119896th rank-one tensor in the CP decompo-sition that is u(1)119896 ∘ u(2)119896 ∘ sdot sdot sdot ∘ u(119879)119896

Figure 2 gives an example of tensor CP decompositionmethod for clustering heterogeneous information networkThe left one is the original networkwith three types of objectsthemiddle cube is a 3-mode tensor and the right one is theCPdecomposition of the 3-mode tensor and also is the partitionof the original network In addition the three types of objectsare the yellow round the blue square and the red triangle

respectively The number within each object is the identifierof the object Each element (black dot in the middle cube) inthe tensor represents a gene-network in the network (blackdashed circle in the left) Each component (black dashedcircle in the right) in the CP decomposition shows one clusterof the original network

The problem in (3) is an NP-hard nonconvex optimiza-tion problem which has a continuous manifold of equivalentsolutions [39] In other words the global minimum isdrowned in many local minima which makes it difficult tobe found In real-world scenarios the objects in the het-erogeneous information networks may belong to more thanone cluster that is the clusters are overlapping Howeverthe number of clusters that the vast majority of objects maybelong to is much smaller than the total number of clustersThat is most of the elements in U(119905) should be zero that isU(119905) should be sparse To overcome the two challenges wecan introduce a Tikhonov regularization term proposed byPaatero in [44 45] in the objective function and replace theobjective function by the following loss function

L (XU(1)U(2) U(119879))= 12 10038171003817100381710038171003817X minus ⟦U(1)U(2) U(119879)⟧100381710038171003817100381710038172119865 + 1205822 119879sum

119905=1

10038171003817100381710038171003817U(119905)100381710038171003817100381710038172119865 (4)

where 120582 gt 0 is a regularization parameter Let

119891 (XU(1)U(2) U(119879))= 12 10038171003817100381710038171003817X minus ⟦U(1)U(2) U(119879)⟧100381710038171003817100381710038172119865 (5)

be the first squared loss function component inL and let

119892 (U(1)U(2) U(119879)) = 1205822 119879sum119905=1

10038171003817100381710038171003817U(119905)100381710038171003817100381710038172119865 (6)

be the Tikhonov regularization term inL respectivelyThen

L = 119891 + 119892 (7)

The Tikhonov regularization term 119892 in the loss functionL has an encouraging property which makes the Frobenius

Scientific Programming 5

norms of all factor matrices in the optimization be equal thatis 10038171003817100381710038171003817U(1)10038171003817100381710038171003817119865 = 10038171003817100381710038171003817U(2)10038171003817100381710038171003817119865 = sdot sdot sdot = 10038171003817100381710038171003817U(119879)10038171003817100381710038171003817119865 (8)

Therefore the local minima of loss function L becomeisolated and any replacement and scaling of the satisfactorysolutions will escape from the optimization The detailsof proof can be found in [39] Meanwhile the Tikhonovregularization term can ensure the sparsity of the factormatrices by penalizing the number of nonzero elements

Therefore the tensor CP decomposition method forclustering heterogeneous information networks can be for-malized as

minU(1) U(2) U(119879)

L (XU(1)U(2) U(119879))st 119870sum

119896=1

119906(119905)119899119896 = 1 forall119905 forall119899119906(119905)119899119896 isin [0 1] forall119905 forall119899 forall119896119873119905sum119899=1

119906(119905)119899119896 gt 0 forall119905 forall119896(9)

where 119899 = 1 2 119873119905 119905 = 1 2 119879 119896 = 1 2 119870and 119870 lt min 1198731 1198732 119873119879 is the total number ofclusters In (9) we divide X into 119870 clusters and obtain thestructure of each cluster which includes the distribution ofeach objectThe first constraint in (9) guarantees that the sumof probabilities for each object belonging to all clusters is 1The second constraint in (9) represents that each probabilityshould be in the range of [0 1] The last constraint in (9)makes sure that there is no empty cluster for any mode

43 The Stochastic Gradient Descent Algorithms Stochasticgradient descent is a mature and widely used tool foroptimizing various models in machine learning such as arti-ficial neural networks support vector machines and logisticregression In this section the regularized clustering problemin (9) will be addressed by the stochastic gradient descentalgorithms The details of tensor algebra and properties usedin this section can be found in [43]

First we review the stochastic gradient descent algorithmTo solve an optimization problem min119909119876(119909) where 119876(119909) isa differentiable object function to be minimized and 119909 is avariable the stochastic gradient descent method to update 119909can be described as

119909 larr997888 119909 minus 120578nabla119876 (119909) (10)

where 120578 is a positive number named learning rate or stepsize The convergence speed of stochastic gradient descentalgorithm depends on the choice of learning rate 120578 and initialsolution

Though the stochastic gradient descent algorithm mayconverge to a local minimum at a linear speed the efficiencyof the algorithmnear the optimal point is not all roses [46] To

speed up the final optimization phase an extension methodnamed second-order stochastic algorithm is designed in [46]which replaces the learning rate 120578 by the inverse of second-order derivative of the object function that is

119909 larr997888 119909 minus 120578 (nabla2119876 (119909))minus1 nabla119876 (119909) (11)

Now we apply the stochastic gradient descent and thesecond-order stochastic algorithm to the clustering prob-lem in (9) and propose two algorithms named SGDClus(Stochastic Gradient Descent for Clustering) and SOSClus(Second-Order Stochastic for Clustering) respectively

431 SGDClus In SGDClus we apply the stochastic gradientdescent algorithm to the clustering problem in (9) Accordingto (10) each factor matrix U(119905) for 119905 = 1 2 119879 is updatedby the rule

U(119905) larr997888 U(119905) minus 120578 120597L120597U(119905) = U(119905) minus 120578( 120597119891120597U(119905) + 120597119892120597U(119905)) (12)

Actually 120597119892120597U(119905) is easy to be obtained according to (6)that is 120597119892120597U(119905) = 120582U(119905) (13)

To compute 120597119891120597U(119905) 119891(XU(1)U(2) U(119879)) can berewritten by matricization ofX along the 119905th mode as

119891(119905) = 12 100381710038171003817100381710038171003817X(119905) minus U(119905) (⊙(119905)U)⊤1003817100381710038171003817100381710038172119865 (14)

where ⊙(119905)U = U(119879) ⊙ sdot sdot sdot ⊙ U(119905+1) ⊙ U(119905minus1) ⊙ sdot sdot sdot ⊙ U(1) Thenwe have120597119891(119905)120597U(119905)

= 120597119879119903 ((X(119905) minus U(119905) (⊙(119905)U)⊤) (X(119905) minus U(119905) (⊙(119905)U)⊤)⊤)2120597U(119905)= 120597119879119903 (X(119905)X

⊤(119905))2120597U(119905) minus 120597119879119903 (2X(119905) (⊙(119905)U) (U(119905))⊤)2120597U(119905)

+ 120597119879119903 ((U(119905) (⊙(119905)U)⊤) (U(119905) (⊙(119905)U)⊤)⊤)2120597U(119905)= minusX(119905) (⊙(119905)U) + U(119905) (⊙(119905)U)⊤ (⊙(119905)U)

(15)

Another way to compute 120597119891(119905)120597U(119905) is given in [39] whichhas the same result as (15) Following the works in [39] wedenote

Γ(119905) = (⊙(119905)U)⊤ (⊙(119905)U)= ((U(1))⊤U(1)) lowast sdot sdot sdot lowast ((U(119905minus1))⊤U(119905minus1))

lowast ((U(119905+1))⊤U(119905+1)) lowast sdot sdot sdot lowast ((U(119879))⊤U(119879)) (16)

6 Scientific Programming

Input X 119870 120582Output U(119905)119879119905=1(1) Initialize U(119905)119879119905=1(2) Set iter larr 1(3) repeat(4) for 119905 larr 1 to119879 do(5) Set 120578(6) Update U(119905) according to (18)(7) Normalize U(119905) according to (19)(8) end for(9) Set iter larr iter + 1(10) until L(XU(1)U(2) U(119879)) unchanged or iter

reaches the maximum iterations

Algorithm 1 The SGDClus algorithm

Therefore the partial derivative ofL is given by

120597L120597U(119905) = minusX(119905) (⊙(119905)U) + U(119905)Γ(119905) + 120582U(119905) (17)

And (12) can be rewritten as

U(119905) larr997888 U(119905) minus 120578 120597L120597U(119905)= U(119905) (I minus 120578 (Γ(119905) + 120582I)) + 120578X(119905) (⊙(119905)U) (18)

where I is an identity matrix Note that U(119905)119879119905=1 derived by(18) do not satisfy the first and second constraints in (9) Tosatisfy these two constraints we can normalize each row ofU(119905)119879119905=1 by

119906(119905)119899119896 larr997888 119906(119905)119899119896sum119870119896=1 119906(119905)119899119896 (19)

Furthermore the pseudocode of SGDClus is given inAlgorithm 1

432 SOSClus In SOSClus we apply the second-orderstochastic algorithm to the clustering problem in (9) Accord-ing to (11) each factor matrix U(119905) for 119905 = 1 2 119879 isupdated by the rule

U(119905) larr997888 U(119905) minus 120578( 1205972L1205972U(119905))minus1 120597L120597U(119905) (20)

According to (17) we can obtain the second-order partialderivative ofL with respect to U(119905) that is

1205972L1205972U(119905) = Γ(119905) + 120582I (21)

By substituting (17) and (21) into (20) we have

U(119905) larr997888 (1 minus 120578)U(119905) + 120578X(119905) (⊙(119905)U) (Γ(119905) + 120582I)minus1 (22)

Input X 119870 120582Output U(119905)119879119905=1(1) Initialize U(119905)119879119905=1(2) Set iter larr 1(3) repeat(4) for 119905 larr 1 to 119879 do(5) Set 120578(6) Compute U(119905)opt according to (23)(7) Update U(119905) according to (24)(8) Normalize U(119905) according to (19)(9) end for(10) Set iter larr iter + 1(11) until L(XU(1)U(2) U(119879)) unchanged or iter

reaches the maximum iterations

Algorithm 2 The SOSClus algorithm

Note thatX(119905)(⊙(119905)U)(Γ(119905)+120582I)minus1 is theGeneralGradient-based Optimization (OPT) [39] solution for updating U(119905) inthe regularized CP decomposition by making (17) equal tozero See the details of proof in [39] Let

U(119905)opt = X(119905) (⊙(119905)U) (Γ(119905) + 120582I)minus1 (23)

Therefore the updating rule of U(119905) in SOSClus is a weightedaverage of the current solution and the OPT solution intu-itively that is

U(119905) larr997888 (1 minus 120578)U(119905) + 120578U(119905)opt (24)

Actually SOSClus is a general extension of GeneralGradient-based Optimization (OPT) [39] and the ALS withstep size restriction in randomized block sampling method[41] In (24) when the learning rate 120578 = 1 we get the OPTsolution When the regularization parameter 120582 = 0 SOSClusbecomes the ALS with step size restriction in randomizedblock sampling method

Similar to SGDClus U(119905)119879119905=1 derived by (24) in SOSClusalso do not satisfy the first and second constraints in (9) Weshould normalize each row of U(119905)119879119905=1 according to (19) Thepseudocode of SOSClus is given in Algorithm 2

44 Feasibility Analysis

Theorem 4 TheCP decomposition ofX obtains the clusteringofmultityped objects in the heterogeneous information network119866 = (119881 119864) simultaneously

Proof Since the proofs for different types of objects in theheterogeneous information network 119866 = (119881 119864) are similarwewill simply describe the process for a single type of objectsWithout loss of generality we detail the proof on the 119905th typeof objects

Given a heterogeneous information network 119866 = (119881 119864)and its tensor representation X the nonzero elements inX represent the input gene-networks in 119866 = (119881 119864) which

Scientific Programming 7

we want to partition into 119870 clusters C1C2 C119870 Thecentre of the cluster C119896 is denoted by c119896 By using thecoordinate format [47] as the sparse representation ofX thegene-networks can be denoted as a matrix M isin R119899119899119911(X)times119879where 119899119899119911(X) is the number of nonzero elements inX Eachrow m119899 isin M 119899 = 1 2 119899119899119911(X) gives the subscripts ofcorresponding nonzero element in X In other words m119899

represents a gene-network and the entries 119898119899119905 isin m119899 119905 =1 2 119879 are the subscripts of the objects contained in thegene-network

The traditional clustering approach such as K-meansminimizes the sum of differences between individual gene-network in each cluster and the corresponding cluster centresthat is

min119901119899119896c119896

119899119899119911(X)sum119899=1

1003817100381710038171003817100381710038171003817100381710038171003817m119899 minus 119870sum119896=1

119901119899119896c11989610038171003817100381710038171003817100381710038171003817100381710038172

119865

(25)

where 119901119899119896 is the probability of gene-network m119899 belongingto the 119896th cluster Also we can rewrite the problem by anew perspective of clustering individual object in the gene-network as follows

min119901119899119905119896119888119896119905

119899119899119911(X)sum119899=1

119879sum119905=1

1003817100381710038171003817100381710038171003817100381710038171003817119898119899119905 minus119870sum119896=1

119901119899119905 11989611988811989611990510038171003817100381710038171003817100381710038171003817100381710038172

119865

(26)

where 119901119899119905 119896 is the probability of object V119905119899 belonging to the 119896thcluster

In the matrix form K-means can be formalized as

minPC

M minus PC2119865 (27)

where P is the cluster indication matrix and C is the clustercentres

By matricization of X along the 119905th mode the CPdecomposition ofX in (3) can be rewritten as

minU(1) U(2) U(119879)

100381710038171003817100381710038171003817X(119905) minus U(119905) (⊙(119905)U)⊤1003817100381710038171003817100381710038172119865 (28)

X(119905) is thematricization ofX along the 119905thmode andM is thesparse representation ofX LetU(119905) = P and let (⊙(119905)U)⊤ = CThat is U(119905) is the cluster indication matrix for the 119905th typeof objects and (⊙(119905)U)⊤ is the cluster centres So the CPdecomposition in (3) is equivalent to the K-means clusteringfor the 119905th type of objects in heterogeneous informationnetwork 119866 = (119881 119864)

By matricization of X in (3) along different modes wecan prove that the CP decomposition is equivalent to theK-means clustering for other types of objects So the CPdecomposition of X obtains the clustering of multitypedobjects in the heterogeneous information network119866 = (119881 119864)simultaneously

It is worth noting that the CP decomposition basedclustering is a soft clustering method the factor matricesindicate the probability of objects belonging to correspondingclusters The soft clustering is more in line with reality

because many objects in the heterogeneous informationnetworks may belong to several clusters In other wordsthe clusters are overlapping In some cases the overlappingclusters need to be translated into nonoverlapping clusterswhich can be achieved by using different approaches such asK-means to cluster the rows of factor matrices Usually thenonoverlapping clusters can be obtained by simply assigningeach object to the cluster which has the largest entry in thecorresponding row of factor matrix

45 Time Complexity Analysis Themain time consumptionof updating each factor matrix U(119905) in SGDClus is com-puting 120597L120597U(119905) According to (17) we need to calculateX(119905)(⊙(119905)U) and U(119905)Γ(119905) respectively Since U(119905) isin R119873119905times119870

and X(119905) isin R119873119905timesprod

119879

119894=1119894 =119905

119873119894

we have X(119905)(⊙(119905)U) isin R119873119905times119870Γ(119905) isin R119870times119870 and U(119905)Γ(119905)isinR119873119905times119870 Firstly if we successively calculate the Khatri-Rao prod-

uct of 119879 minus 1 matrices and a matrix-matrix multiplicationwhen computing X(119905)(⊙(119905)U) the intermediate results willbe of very large size and the computational cost will bevery expensive In practice we can reduce the complexityby ignoring the unnecessary calculation Let us observe theelement ofX(119905)(⊙(119905)U) that is

(X(119905) (⊙(119905)U))119899119905 119896

= sum119899119894119879

119894=1119894 =119905

(119909119899119905 prod119879119894=1119894 =119905

119899119894

119879prod119894=1119894 =119905

119906(119894)119899119894 119896) (29)

119909119899119905 prod119879119894=1119894 =119905

119899119894is an element in the matricization ofX which rep-

resents a corresponding gene-network in the heterogeneousinformation networkWhen 119909119899119905 prod119879119894=1

119894 =119905

119899119894= 0 we can ignore the

following Khatri-Rao product Hence only nonzero elementsinX need to be computedTherefore the time complexity forcomputingX(119905)(⊙(119905)U) is 119874(119899119899119911(X)119873119905119870)

Secondly the element of U(119905)Γ(119905) is given by

(U(119905)Γ(119905))119899119905 119896

= 119870sum119895=1

(119906(119905)119899119905 119895 119879prod119894=1119894 =119905

119873119894sum119899119894=1

119906(119894)119899119894 119895119906(119894)119899119894 119896) (30)

So the time complexity of computing U(119905)Γ(119905) is 119874((119873 minus119873119905)1198702) where 119873 = sum119879119905=1119873119905 is the total number of objectsin the networks

Above all the time complexity of each iteration inSGDClus is 119874(119899119899119911(X)119873119870 + (119879 minus 1)1198731198702) Note that in thereal-world heterogeneous information networks the numberof clusters119870 and the number of object types 119879 are usually farless than119873 that is119870 ≪ 119873 and 119879 ≪ 119873

According to (22) the time complexity of updatingeach factor matrix U(119905) in SOSClus is composed of threecomponents X(119905)(⊙(119905)U) (Γ(119905) + 120582I)minus1 and the product ofthem Compared to SGDClus only the time consumption ofcomputing an inverse matrix for (Γ(119905) + 120582I) is additional in

8 Scientific Programming

SOSClus Since (Γ(119905)+120582I) is a119870times119870 squarematrix computingthe inverse of such matrix costs 119874(1198703) Nevertheless 119874(1198703)is usually negligible because119870 ≪ 119873

5 Experiments and Results

In this section we present several experiments on syntheticand real-world datasets for heterogeneous information net-works and compare the performance with a number of state-of-the-art clustering methods

51 Evaluation Metrics and Experimental Setting

511 Evaluation Metrics In the experiments we adopt theNormalized Mutual Information (NMI) [48] and Accuracy(AC) as our performance measurements

NMI is used to measure the mutual dependence infor-mation between the clustering result and the ground truthGiven 119873 objects 119870 clusters one clustering result andthe ground truth classes for the objects let 119899(119894 119895) 119894 119895 =1 2 119870 be the number of objects that labeled 119894 in clus-tering result but labeled 119895 in the ground truth The jointdistribution can be defined as119901(119894 119895) = 119899(119894 119895)119873 themarginaldistribution of rows can be calculated as 1199011(119895) = sum119870119894=1 119901(119894 119895)and themarginal distribution of columns can be calculated as1199012(119894) = sum119870119895=1 119901(119894 119895) Then the NMI is defined as

NMI = sum119870119894=1sum119870119895=1 119901 (119894 119895) log (119901 (119894 119895) 1199011 (119895) 1199012 (119894))radicsum119870119895=1 1199011 (119895) log1199011 (119895)sum119870119894=1 1199012 (119894) log1199012 (119894) (31)

The NMI ranges from 0 to 1 the larger value of NMI thebetter the clustering result

AC is used to compute the clustering accuracy thatmeasures the percent of the correct clustering result AC isdefined as

AC = sum119879119905=1sum119873119905119899=1 120575 (map (120592119905119899) label (120592119905119899))sum119879119905=1119873119905 (32)

where map(120592119905119899) is the cluster label of the object 120592119905119899 and thelabel(120592119905119899) is the ground truth class of the object 120592119905119899 And 120575(sdot)is an indicator function

120575 (sdot) = 1 if map (120592119905119899) = label (120592119905119899) 0 if map (120592119905119899) = label (120592119905119899) (33)

Since both ofNMI andACare used tomeasure the perfor-mance of clustering one type of object the weighted averageNMI and AC are also used to measure the performance ofSTFClus and other state-of-the-art methods

NMI = sum119879119905=1119873119905 (NMI)119905sum119879119905=1119873119905 AC = sum119879119905=1119873119905 (AC)119905sum119879119905=1119873119905 (34)

Table 1 The synthetic datasets

Syntheticdatasets 119879 119870 119878 119863Syn1 2 2 1119872 = 1000 times 1000 01Syn2 2 4 10119872 = 1000 times 10000 001Syn3 4 2 100119872 = 100 times 100 times 100 times 100 01Syn4 4 4 1000119872 = 100 times 100 times 100 times 1000 001119879 is the number of object types in the heterogeneous information networkand also the number of modes in the tensor119870 is the number of clusters 119878 isthe network scale and 119878 = 1198731 times1198732 times sdot sdot sdot times119873119879119863 is the density of the tensorthat is the percentage of nonzero elements in the tensor and119863 = 119899119899119911(X)119878

512 Experimental Setting In order to compare the perfor-mance of our proposed SGDClus and SOSClus with othersimpartially all methods share a common stopping conditionthat is 1003816100381610038161003816Liter minusLiterminus1

1003816100381610038161003816Literminus1

le 10minus6 (35)

or iter reaches the maximum iterations Liter and Literminus1are the values of function L at the current that is (iter)thiteration and the previous that is (iter minus 1)th iterationrespectively And we set the maximum iterations to be 1000Throughout the experiments the regularization parameter 120582in SGDClus and SOSClus is fixed as 120582 = 0001

All experiments are implemented in theMATLABR2015a(version 850) 64-bit And the MATLAB Tensor Toolbox(version 26 httpwwwsandiagovsimtgkoldaTensorTool-box) is used in our experiments Since the heterogeneousinformation networks are often sparse in real-world scenar-ios that is most elements in the tensor X are zeros we usethe sparse format of X as proposed in [47] which has beensupported by MATLAB Tensor Toolbox The experimentalresults are the average values obtained by running thealgorithms ten times on corresponding datasets

52 Experiments on Synthetic Datasets

521 The Synthetic Datasets Description The purposeof using synthetic datasets is to examine whether theproposed tensor CP decomposition clustering frameworkcan work well since the detailed cluster structures of thesynthetic datasets are known In order to make the syntheticdatasets similar to a realistic situation we assume thatthe distribution for different types of objects that appearin a gene-network follows Zipf rsquos law (see details onlinehttpsenwikipediaorgwikiZipf rsquos_law) Zipf rsquos law isdefined by119891(119903 120588119873) = 119903minus120588sum119873119899=1 119899minus120588 where119873 is the numberof objects 119903 is the object index and 120588 is the parameter charac-terizing the distribution Zipf rsquos law denotes the frequency ofthe 119903th object appearing in the gene-networkWe set 120588 = 095and generate 4 synthetic datasets with different parametersThe details of these synthetic datasets are shown in Table 1

522 Experimental Results In the beginning of the experi-ments we set a common learning rate 120578 = 1(iter + 1) for

Scientific Programming 9

SGDClus on synthetic datasets

0 10 20 30 40 50

Iteration

Erro

r

Syn1Syn2

Syn3Syn4

000

005

010

015

020

025

030

035

SOSClus on synthetic datasets

0 10 20 30 40 50

Iteration

Syn1Syn2

Syn3Syn4

00

01

02

03

04

05

06

07

08

Erro

r

Figure 3 Convergence speed of SGDClus and SOSClus on the 4 synthetic datasets with a common learning rate that is 120578 = 1(iter + 1)SGDClus and SOSClus We find that SOSClus has a fasterconvergence speed and better robustness with respect to thelearning rate which is clearly shown in Figure 3 AlthoughSGDClus may eventually converge to a local minimum theefficiency of the optimization near a local minimum is not allroses As shown in Figure 3 the solutions of SGDClus swingaround a local minimum This phenomenon proves that theconvergence speed of SGDClus is sensitive to the choice oflearning rate 120578

Then we modify the learning rate to 120578 = 1(iter +119888) for SGDClus where 119888 is a constant optimized in theexperiments In practice 119888 = 27855 for SGDClus runningon Syn3 and 119888 = 430245 for SGDClus running on Syn4The performance comparison of SGDClus with learning rate120578 = 1(iter + 119888) and SOSClus with learning rate 120578 = 1(iter +1) on Syn3 and Syn4 is shown in Figure 4 By employingthe optimized learning rate SGDClus converges to a localminimumquicklyHowever compared to SOSClus SGDClusstill has no advantageThe hand drawing blue circles over thecurve of SOSClus in Figure 4 shows that SOSClus can escapefrom a local minimum and find the global minimum whileSGDClus just obtains the first reaching local minimum

According to (23) and (24) we accessorily obtain the solu-tions of OPT by running SOSClus So we compare the ACand NMI of OPT SGDClus and SOSClus on the 4 syntheticdatasets which are shown in Figure 5 With the increase ofobject types in the heterogeneous information networks theAC and NMI of SOSClus and OPT increase distinctly whileperformance of SGDClus almost does not changeWhen 119879 =2 AC and NMI of these three methods on Syn1 and Syn2are almost equal and low However AC and NMI of SOSClusincrease to 1 when 119879 = 4 Since the histograms of OPT SGD-Clus and SOSClus on Syn1 and Syn2 are almost the same andon Syn3 and Syn4 are also similar we know that the param-eters 119870 and 119878 have no significant effect on the performanceGenerally the larger density119863 and the number of object types119879 in the network result in higher AC and NMI of SOSClus

Obviously in the experiments on the 4 synthetic datasetsSOSClus shows an excellent performance SOSClus has afaster convergence speed and better robustness with respectto the learning rate Meanwhile SOSClus performs better onAC and NMI because it can escape from a local minimumand find the global minimum

53 Performance Comparison on Real-World Dataset

531 Real-World Dataset Description The experiments onthe real-world dataset are used to compare the performanceof the tensor CP decomposition clustering framework withother state-of-the-art methods

The real-world dataset extracted from the DBLP databaseis the DBLP-four-area dataset which can be downloadedfrom httpwebcsuclaedusimyzsundataDBLP_four_areazip It is a four research-area subset of DBLP and is usedin [2 3 12 13 15 16 18] The four research areas in DBLP-four-area dataset are database (DB) data mining (DM)machine learning (ML) and information retrieval (IR)respectively There are five representative conferences ineach area And all related authors papers published in theseconferences and terms contained in these papersrsquo titlesare included The DBLP-four-area dataset contains 14376papers with 100 labeled 14475 authors with 4057 labeled20 labeled conferences and 8920 terms The density of theDBLP-four-area dataset is 901935 times 10minus9 so we construct a4-mode tensor with size of 14376 times 14475 times 20 times 8920 and334832 nonzero elements We compare the performance oftensor CP decomposition clustering framework with severalother methods on the labeled record in this dataset

532 Comparative Methods

(i) NetClus (see [2]) An extended version of RankClus [1]which can deal with the network follows the star networkschema The time complexity of NetClus for clustering each

10 Scientific Programming

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

08

Erro

rExperiments on Syn3

SGDClusSOSClus

Escaped from a local minimum

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

Erro

r

Experiments on Syn4

SGDClusSOSClus

Escaped from a local minimum

Figure 4 Convergence speed of SGDClus and SOSClus on Syn3 and Syn4 with different learning rate The learning rate for SOSClus is120578 = 1(iter + 1) while that for SGDClus is 120578 = 1(iter + 119888) where 119888 is a constant optimized in the experiments

OPTSGDClusSOSClus

Synthetic datasets

00

02

04

06

08

10

AC

Syn1 Syn2 Syn4Syn3

OPTSGDClusSOSClus

NM

I

00

02

04

06

08

10

Synthetic datasetsSyn1 Syn2 Syn4Syn3

Figure 5 The AC and NMI of OPT SGDClus and SOSClus on the 4 synthetic datasets

object type in each iteration is 119874(119870|119864| + (1198702 + 119870)119873) where119870 is the number of clusters |119864| is the number of edges in thenetwork and119873 is the total number of objects in the network

(ii) PathSelClus (see [15 16]) A clustering method based onthe predefined metapath requires a user guide In PathSel-Clus the distance between the same type objects is measuredby PathSim [3] and themethod starts with the given seeds byuser The time complexity of PathSelClus for clustering eachobject type in each iteration is 119874((119870 + 1)|P| + 119870119873) where|P| is the number of metapath instances in the networkAnd the time complexity of PathSim used by PathSelClus forclustering each object type is 119874(119873119889) where 119889 is the averagedegree of objects

(iii) FctClus (see [13]) It is a recently proposed clusteringmethod for heterogeneous information networks As withNetClus the FctClus method can deal with networks follow-ing the star network schema The time complexity of FctClusfor clustering each object type in each iteration is 119874(119870|119864| +119879119870119873)533 Experimental Results As the baseline methods canonly deal with specific schema heterogeneous informationnetworks here we must construct different subnetworks forthem For NetClus and FctClus the network is organized as astar network schema like in [2 13] where the paper (P) is thecentre type and author (A) conference (C) and term (T) are

Scientific Programming 11

Table 2 AC of experiments on DBLP-four-area dataset

AC OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 05882 08476 09007 07154 07551 07887Author 05872 08486 09486 07177 07951 08008Conference 1 099 1 09172 09950 09031avg (AC) 05892 08493 09477 07186 07951 08010

Table 3 NMI of experiments on DBLP-four-area dataset

NMI OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 06557 06720 08812 05402 06142 07152Author 06539 08872 08822 05488 06770 06012Conference 1 08497 1 08858 09906 08248avg (NMI) 06556 08778 08827 05503 06770 06050

Table 4 Running time of experiments on DBLP-four-area dataset

Running time (s) OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper mdash mdash mdash 8026 5423 8084Author mdash mdash mdash 7437 6811 7749Conference mdash mdash mdash 6584 6293 6698Total time 6726 4324 8184 22047 18527 22531

the attribute types For PathSelClus we select the metapathof P-T-P A-P-C-P-A and C-P-T-P-C to cluster the papersauthors and conferences respectively And in PathSelCluswe give each cluster one seed to start

Wemodel theDBLP-four-area dataset as a 4-mode tensorwhere each mode represents one object type The 4 modesare author (A) paper (P) conference (C) and term (T)respectively Actually the sequence of the object types isinsignificant And each element of the tensor represents agene-network in the heterogeneous information network Inthe experiments we set the learning rate for SOSClus to be120578 = 1(iter+1) and an optimized learning rate 120578 = 1(iter+119888)with 119888 = 1000125 for SGDClus By running SOSClus weaccessorily obtain the solutions of OPT So we compare theexperimental results of OPT SGDClus and SOSClus on theDBLP-four-area dataset with the three baseline methods Seethe details in Tables 2 3 and 4

In Tables 2 and 3 SOSClus performs best on average ACand NMI and SGDClus takes the second place All methodsachieve satisfactory AC and NMI on the conference sincethere are only 20 conferences in the network SGDClus takesthe shortest running time and OPT and SOSClus have anobvious advantage on running time compared with otherbaselinesThe time complexity of SGDClus is119874(119899119899119911(X)119873119870+(119879 minus 1)1198702119873) and the time complexity of SOSClus is119874(119899119899119911(X)119873119870+(119879minus1)1198702119873+1198703) where 119899119899119911(X) is the num-ber of nonzero elements in X that is the number of gene-networks in the heterogeneous information network Wehave 119899119899z(X) lt |P| ≪ |119864| 119870 ≪ 119873 and 119879 ≪ 119873 Comparedwith the time complexity of the three baselines SGDClus andSOSClus have a little disadvantage However it is worth not-ing that the three baselines can only cluster one type of objectsin the network in each running while OPT SGDClus and

SOSClus can obtain the clusters of all types of objects simulta-neously by running onceThis is the reasonwhy only the totaltime is shown for OPT SGDClus and SOSClus in Table 4Moreover the total time of OPT SGDClus and SOSClusis on the same order of magnitude as the running time ofother baselines for clustering each type of objects which isconsistent with the comparison of the time complexity

6 Conclusion

In this paper a tensor CP decomposition method for clus-tering heterogeneous information networks is presented Intensor CP decomposition clustering framework each type ofobjects in heterogeneous information network is modeled asone mode of tensor and the gene-networks in the networkare modeled as the elements in tensor In other wordstensor CP decomposition clustering framework can modeldifferent types of objects and semantic relations in the het-erogeneous information network without the restriction ofnetwork schema In addition two stochastic gradient descentalgorithms named SGDClus and SOSClus are designedSGDClus and SOSClus can cluster all types of objects andthe gene-networks simultaneously by running once Theproposed algorithms outperformed other state-of-the-artclustering methods in terms of AC NMI and running time

Notations119886 A scalar (lowercase letter)a A vector (boldface lowercase letter)A A matrix (boldface capital letter)X A tensor (calligraphic letter)

12 Scientific Programming

1199091198941 1198942119894119873 The (1198941 1198942 119894119873)th element of an 119873th-order tensorX

X(119899) Matricization ofX along the 119899th modelowast Hadamard product (element-wise prod-uct) of two tensors (or matrices or vectors)with the same dimension⊙ Khatri-Rao product of two matrices sdot 119865 Frobenius norm of a tensor (or matrices orvectors)

a ∘ b Outer product of two vectors

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This study was supported by the National Natural ScienceFoundation of China (no 61401482 and no 61401483)

References

[1] Y Sunt J Hant P Zhao Z Yin H Cheng and T WuldquoRankClus integrating clustering with ranking for heteroge-neous information network analysisrdquo in Proceedings of the 12thInternational Conference on Extending Database TechnologyAdvances inDatabase Technology (EDBT rsquo09) pp 565ndash576 SaintPetersburg Russia March 2009

[2] Y Sun Y Yu and J Han ldquoRanking-based clustering of hetero-geneous information networks with star network schemardquo inProceedings of the 15th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo09) pp 797ndash805 Paris France July 2009

[3] Y Sun J Han X Yan P S Yu and TWu ldquoPathSim meta path-based top-K similarity search in heterogeneous informationnetworksrdquo PVLDB vol 4 no 11 pp 992ndash1003 2011

[4] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability Volume1 Statistics pp 281ndash297 University of California Press BerkeleyCalif USA 1967

[5] S Guha R Rastogi and K Shim ldquoCure an efficient clusteringalgorithm for large databasesrdquo Information Systems vol 26 no1 pp 35ndash58 2001

[6] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the 2nd International Conference onKnowledge Discovery and Data Mining (KDD-96) E SimoudisJ Han and U M Fayyad Eds pp 226ndash231 AAAI PressPortland Ore USA 1996

[7] W Wang J Yang and R R Muntz ldquoSTING a statisticalinformation grid approach to spatial data miningrdquo in Pro-ceedings of the 23rd International Conference on Very LargeData Bases (VLDB rsquo97) M Jarke M J Carey K R DittrichF H Lochovsky P Loucopoulos and M A Jeusfeld Edspp 186ndash195 Morgan Kaufmann Athens Greece August 1997httpwwwvldborgconf1997P186PDF

[8] E H Ruspini ldquoNew experimental results in fuzzy clusteringrdquoInformation Sciences vol 6 no 73 pp 273ndash284 1973

[9] U von Luxburg ldquoA tutorial on spectral clusteringrdquo Statistics andComputing vol 17 no 4 pp 395ndash416 2007

[10] C Shi Y Li J Zhang Y Sun and P S Yu ldquoA survey of hetero-geneous information network analysisrdquo IEEE Transactions onKnowledge and Data Engineering vol 29 no 1 pp 17ndash37 2017

[11] J Han Y Sun X Yan and P S Yu ldquoMining knowledge fromdata an information network analysis approachrdquo in Proceedingsof the IEEE 28th International Conference on Data Engineering(ICDE rsquo12) pp 1214ndash1217 IEEE Washington DC USA April2012

[12] J Chen W Dai Y Sun and J Dy ldquoClustering and rankingin heterogeneous information networks via gamma-poissonmodelrdquo in Proceedings of the SIAM International Conference onData Mining (SDM rsquo15) pp 424ndash432 May 2015

[13] J Yang L Chen and J Zhang ldquoFctClus a fast clustering algo-rithm for heterogeneous information networksrdquoPLoSONE vol10 no 6 Article ID e0130086 2015

[14] C Shi R Wang Y Li P S Yu and B Wu ldquoRanking-basedclustering on general heterogeneous information networks bynetwork projectionrdquo in Proceedings of the 23rd ACM Interna-tional Conference on Conference on Information and KnowledgeManagement (CIKM rsquo14) pp 699ndash708 ACM Shanghai ChinaNovember 2014

[15] Y Sun B Norick J Han X Yan P S Yu and X Yu ldquoInte-grating meta-path selection with user-guided object clusteringin heterogeneous information networksrdquo in Proceedings of the18th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo12) pp 1348ndash1356 ACMBeijing China August 2012

[16] Y Sun B Norick J Han X Yan P S Yu and X YuldquoPathSelClus integrating meta-path selection with user-guidedObject clustering in heterogeneous information networksrdquoACMTransactions onKnowledgeDiscovery fromData vol 7 no3 pp 723ndash724 2013

[17] X Yu Y Sun B Norick T Mao and J Han ldquoUser guided entitysimilarity search using meta-path selection in heterogeneousinformation networksrdquo in Proceedings of the 21st ACM Interna-tional Conference on Information and Knowledge Management(CIKM rsquo12) pp 2025ndash2029 ACM November 2012

[18] Y Sun C C Aggarwal and J Han ldquoRelation strength-awareclustering of heterogeneous information networks with incom-plete attributesrdquoProceedings of the VLDBEndowment vol 5 no5 pp 394ndash405 2012

[19] M Zhang H Hu Z He andWWang ldquoTop-k similarity searchin heterogeneous information networks with x-star networkschemardquo Expert Systems with Applications vol 42 no 2 pp699ndash712 2015

[20] Y Zhou and L Liu ldquoSocial influence based clustering ofheterogeneous information networksrdquo in Proceedings of the19th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 338ndash346 ACM Chicago IllUSA August 2013

[21] L R Tucker ldquoSome mathematical notes on three-mode factoranalysisrdquo Psychometrika vol 31 pp 279ndash311 1966

[22] R A Harshman ldquoFoundations of the parafac procedure modeland conditions for an lsquoexplanatoryrsquo multi-mode factor analysisrdquoUCLAWorking Papers in Phonetics 1969

[23] H A L Kiers ldquoTowards a standardized notation and terminol-ogy in multiway analysisrdquo Journal of Chemometrics vol 14 no3 pp 105ndash122 2000

[24] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

Scientific Programming 13

[25] A Cichocki D Mandic L De Lathauwer et al ldquoTensordecompositions for signal processing applications from two-way to multiway component analysisrdquo IEEE Signal ProcessingMagazine vol 32 no 2 pp 145ndash163 2015

[26] W Peng and T Li ldquoTensor clustering via adaptive subspaceiterationrdquo Intelligent Data Analysis vol 15 no 5 pp 695ndash7132011

[27] J Hastad ldquoTensor rank is NP-completerdquo Journal of AlgorithmsCognition Informatics and Logic vol 11 no 4 pp 644ndash6541990

[28] S Metzler and P Miettinen ldquoClustering Boolean tensorsrdquo DataMining amp Knowledge Discovery vol 29 no 5 pp 1343ndash13732015

[29] X Cao X Wei Y Han and D Lin ldquoRobust face clustering viatensor decompositionrdquo IEEE Transactions on Cybernetics vol45 no 11 pp 2546ndash2557 2015

[30] I Sutskever R Salakhutdinov and J B Tenenbaum ldquoModellingrelational data using Bayesian clustered tensor factorizationrdquoin Proceedings of the 23rd Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo09) pp 1821ndash1828 BritishColumbia Canada December 2009

[31] B Ermis E Acar and A T Cemgil ldquoLink prediction in het-erogeneous data via generalized coupled tensor factorizationrdquoDataMining amp Knowledge Discovery vol 29 no 1 pp 203ndash2362015

[32] A R Benson D F Gleiche and J Leskovec ldquoTensor spectralclustering for partitioning higher-order network structuresrdquoin Proceedings of the SIAM International Conference on DataMining (SDM rsquo15) pp 118ndash126 Vancouver Canada May 2015

[33] L Xiong X Chen T-K Huang J G Schneider and JG Carbonell ldquoTemporal collaborative filtering with Bayesianprobabilistic tensor factorizationrdquo in Proceedings of the 10thSIAM International Conference on Data Mining (SDM rsquo10) pp211ndash222 Columbus Ohio USA May 2010

[34] E E Papalexakis L Akoglu and D Ience ldquoDo more views ofa graph help Community detection and clustering in multi-graphsrdquo in Proceedings of the 16th International Conferenceof Information Fusion (FUSION rsquo13) pp 899ndash905 IstanbulTurkey July 2013

[35] M Vandecappelle M Bousse F V Eeghem and L D Lath-auwer ldquoTensor decompositions for graph clusteringrdquo Inter-nal Report 16-170 ESAT-STADIUS KU Leuven LeuvenBelgium 2016 ftpftpesatkuleuvenbepubSISTAsistakul-akreports2016_Tensor_Graph_Clusteringpdf

[36] W Shao L He and P S Yu ldquoClustering on multi-sourceincomplete data via tensor modeling and factorizationrdquo inProceedings of the 19th Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD rsquo15) pp 485ndash497 Ho ChiMinh City Vietnam 2015

[37] J D Carroll and J-J Chang ldquoAnalysis of individual differencesin multidimensional scaling via an n-way generalization ofldquoEckart-Youngrdquo decompositionrdquo Psychometrika vol 35 no 3pp 283ndash319 1970

[38] L De Lathauwer B De Moor and J Vandewalle ldquoOn thebest rank-1 and rank-(R1 R2 RN) approximation ofhigher-order tensorsrdquo SIAM Journal on Matrix Analysis andApplications vol 21 no 4 pp 1324ndash1342 2000

[39] E Acar D M Dunlavy and T G Kolda ldquoA scalable opti-mization approach for fitting canonical tensor decompositionsrdquoJournal of Chemometrics vol 25 no 2 pp 67ndash86 2011

[40] S Hansen T Plantenga and T G Kolda ldquoNewton-basedoptimization for Kullback-Leibler nonnegative tensor factor-izationsrdquo Optimization Methods amp Software vol 30 no 5 pp1002ndash1029 2015

[41] NVervliet and LDe Lathauwer ldquoA randomized block samplingapproach to canonical polyadic decomposition of large-scaletensorsrdquo IEEE Journal on SelectedTopics in Signal Processing vol10 no 2 pp 284ndash295 2016

[42] R Ge F Huang C Jin and Y Yuan ldquoEscaping from saddlepointsmdashonline stochastic gradient for tensor decompositionrdquohttparxivorgabs150302101

[43] T G Kolda ldquoMultilinear operators for higher-order decompo-sitionsrdquo Tech Rep SAND2006-2081 Sandia National Labora-tories 2006 httpwwwostigovscitechbiblio923081=0pt

[44] P Paatero ldquoA weighted non-negative least squares algorithmfor three-way lsquoPARAFACrsquo factor analysisrdquo Chemometrics ampIntelligent Laboratory Systems vol 38 no 2 pp 223ndash242 1997

[45] P Paatero ldquoConstruction and analysis of degenerate PARAFACmodelsrdquo Journal of Chemometrics vol 14 no 3 pp 285ndash2992000

[46] B L Bottou and N Murata ldquoStochastic approximations andefficient learningrdquo inTheHandbook of BrainTheory and NeuralNetworks 2nd edition 2002

[47] B W Bader and T G Kolda ldquoEfficient MATLAB computationswith sparse and factored tensorsrdquo SIAM Journal on ScientificComputing vol 30 no 1 pp 205ndash231 2007

[48] A Strehl and J Ghosh ldquoCluster ensemblesmdasha knowledgereuse framework for combining multiple partitionsrdquo Journal ofMachine Learning Research vol 3 no 3 pp 583ndash617 2003

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: A Tensor CP Decomposition Method for Clustering ...downloads.hindawi.com/journals/sp/2017/2803091.pdf · spectral clustering, are designed to analyze homogeneous informationnetworks

4 Scientific Programming

+

21

3

4

1

4

2

3

Cluster 1

1 2 3 412

34

12

34

5

Cluster 2 Cluster K

1

2

5

4

3 + middot middot middot +

Figure 2 An illustration of tensor CP decomposition method for clustering heterogeneous information network

DBLP network (in Figure 1 which contains four types ofobjects 119860 119875 119881 119879) denoted by 1198661015840 = (V119860119894 V119875119895 V119881119898 V119879119899 ⟨V119860119894 V119875119895 ⟩ ⟨V119875119895 V119881119898⟩⟨V119875119895 V119879119899 ⟩) represents a semantic relation ofldquoan Author V119860119894 writes a Paper V119875119895 published in the Venue V119881119898and containing the Term V119879119899 rdquo For simplicity we can use thesubscript of each object in 1198661015840 to mark the correspondinggene-network In the example the gene-network 1198661015840 can bemarked by 1198661015840119894119895119898119899

LetX be a119879th-order tensor of size1198731times1198732timessdot sdot sdottimes119873119879 eachmode of X represents one type of objects in the network 119866An arbitrary element 11990911989911198992sdotsdotsdot119899119879 isin 0 1 for 119899119905 = 1 2 119873119905is an indicator of whether the corresponding gene-network11986610158401198991 1198992 119899119879 exists that is

11990911989911198992sdotsdotsdot119899119879 = 1 if exist11986610158401198991 1198992119899119879 0 otherwise

(2)

Then the heterogeneous information network119866 = (119881 119864)can be represented by the form of tensor asX

42 Problem Formulation Using the tensor representationX of 119866 = (119881 119864) we can partition the multityped objectsinto different clusters by the CP decomposition We assumethat there are 119870 clusters in 119866 = (119881 119864) and denote U(119905) isinR119873119905times119870 119905 = 1 2 119879 as the cluster indication matrix of the119905th type of objects Then the CP decomposition ofX is

min 10038171003817100381710038171003817X minus ⟦U(1)U(2) U(119879)⟧100381710038171003817100381710038172119865 (3)

Each row u(119905)119896 = [119906(119905)1198961 119906(119905)1198962 119906(119905)119896119873119905]⊤ of the factor matrixU(119905) is the probability vector for each object from 119905th typebelonging to the 119896th cluster In other words the 119896th clusteris composed of the 119896th rank-one tensor in the CP decompo-sition that is u(1)119896 ∘ u(2)119896 ∘ sdot sdot sdot ∘ u(119879)119896

Figure 2 gives an example of tensor CP decompositionmethod for clustering heterogeneous information networkThe left one is the original networkwith three types of objectsthemiddle cube is a 3-mode tensor and the right one is theCPdecomposition of the 3-mode tensor and also is the partitionof the original network In addition the three types of objectsare the yellow round the blue square and the red triangle

respectively The number within each object is the identifierof the object Each element (black dot in the middle cube) inthe tensor represents a gene-network in the network (blackdashed circle in the left) Each component (black dashedcircle in the right) in the CP decomposition shows one clusterof the original network

The problem in (3) is an NP-hard nonconvex optimiza-tion problem which has a continuous manifold of equivalentsolutions [39] In other words the global minimum isdrowned in many local minima which makes it difficult tobe found In real-world scenarios the objects in the het-erogeneous information networks may belong to more thanone cluster that is the clusters are overlapping Howeverthe number of clusters that the vast majority of objects maybelong to is much smaller than the total number of clustersThat is most of the elements in U(119905) should be zero that isU(119905) should be sparse To overcome the two challenges wecan introduce a Tikhonov regularization term proposed byPaatero in [44 45] in the objective function and replace theobjective function by the following loss function

L (XU(1)U(2) U(119879))= 12 10038171003817100381710038171003817X minus ⟦U(1)U(2) U(119879)⟧100381710038171003817100381710038172119865 + 1205822 119879sum

119905=1

10038171003817100381710038171003817U(119905)100381710038171003817100381710038172119865 (4)

where 120582 gt 0 is a regularization parameter Let

119891 (XU(1)U(2) U(119879))= 12 10038171003817100381710038171003817X minus ⟦U(1)U(2) U(119879)⟧100381710038171003817100381710038172119865 (5)

be the first squared loss function component inL and let

119892 (U(1)U(2) U(119879)) = 1205822 119879sum119905=1

10038171003817100381710038171003817U(119905)100381710038171003817100381710038172119865 (6)

be the Tikhonov regularization term inL respectivelyThen

L = 119891 + 119892 (7)

The Tikhonov regularization term 119892 in the loss functionL has an encouraging property which makes the Frobenius

Scientific Programming 5

norms of all factor matrices in the optimization be equal thatis 10038171003817100381710038171003817U(1)10038171003817100381710038171003817119865 = 10038171003817100381710038171003817U(2)10038171003817100381710038171003817119865 = sdot sdot sdot = 10038171003817100381710038171003817U(119879)10038171003817100381710038171003817119865 (8)

Therefore the local minima of loss function L becomeisolated and any replacement and scaling of the satisfactorysolutions will escape from the optimization The detailsof proof can be found in [39] Meanwhile the Tikhonovregularization term can ensure the sparsity of the factormatrices by penalizing the number of nonzero elements

Therefore the tensor CP decomposition method forclustering heterogeneous information networks can be for-malized as

minU(1) U(2) U(119879)

L (XU(1)U(2) U(119879))st 119870sum

119896=1

119906(119905)119899119896 = 1 forall119905 forall119899119906(119905)119899119896 isin [0 1] forall119905 forall119899 forall119896119873119905sum119899=1

119906(119905)119899119896 gt 0 forall119905 forall119896(9)

where 119899 = 1 2 119873119905 119905 = 1 2 119879 119896 = 1 2 119870and 119870 lt min 1198731 1198732 119873119879 is the total number ofclusters In (9) we divide X into 119870 clusters and obtain thestructure of each cluster which includes the distribution ofeach objectThe first constraint in (9) guarantees that the sumof probabilities for each object belonging to all clusters is 1The second constraint in (9) represents that each probabilityshould be in the range of [0 1] The last constraint in (9)makes sure that there is no empty cluster for any mode

43 The Stochastic Gradient Descent Algorithms Stochasticgradient descent is a mature and widely used tool foroptimizing various models in machine learning such as arti-ficial neural networks support vector machines and logisticregression In this section the regularized clustering problemin (9) will be addressed by the stochastic gradient descentalgorithms The details of tensor algebra and properties usedin this section can be found in [43]

First we review the stochastic gradient descent algorithmTo solve an optimization problem min119909119876(119909) where 119876(119909) isa differentiable object function to be minimized and 119909 is avariable the stochastic gradient descent method to update 119909can be described as

119909 larr997888 119909 minus 120578nabla119876 (119909) (10)

where 120578 is a positive number named learning rate or stepsize The convergence speed of stochastic gradient descentalgorithm depends on the choice of learning rate 120578 and initialsolution

Though the stochastic gradient descent algorithm mayconverge to a local minimum at a linear speed the efficiencyof the algorithmnear the optimal point is not all roses [46] To

speed up the final optimization phase an extension methodnamed second-order stochastic algorithm is designed in [46]which replaces the learning rate 120578 by the inverse of second-order derivative of the object function that is

119909 larr997888 119909 minus 120578 (nabla2119876 (119909))minus1 nabla119876 (119909) (11)

Now we apply the stochastic gradient descent and thesecond-order stochastic algorithm to the clustering prob-lem in (9) and propose two algorithms named SGDClus(Stochastic Gradient Descent for Clustering) and SOSClus(Second-Order Stochastic for Clustering) respectively

431 SGDClus In SGDClus we apply the stochastic gradientdescent algorithm to the clustering problem in (9) Accordingto (10) each factor matrix U(119905) for 119905 = 1 2 119879 is updatedby the rule

U(119905) larr997888 U(119905) minus 120578 120597L120597U(119905) = U(119905) minus 120578( 120597119891120597U(119905) + 120597119892120597U(119905)) (12)

Actually 120597119892120597U(119905) is easy to be obtained according to (6)that is 120597119892120597U(119905) = 120582U(119905) (13)

To compute 120597119891120597U(119905) 119891(XU(1)U(2) U(119879)) can berewritten by matricization ofX along the 119905th mode as

119891(119905) = 12 100381710038171003817100381710038171003817X(119905) minus U(119905) (⊙(119905)U)⊤1003817100381710038171003817100381710038172119865 (14)

where ⊙(119905)U = U(119879) ⊙ sdot sdot sdot ⊙ U(119905+1) ⊙ U(119905minus1) ⊙ sdot sdot sdot ⊙ U(1) Thenwe have120597119891(119905)120597U(119905)

= 120597119879119903 ((X(119905) minus U(119905) (⊙(119905)U)⊤) (X(119905) minus U(119905) (⊙(119905)U)⊤)⊤)2120597U(119905)= 120597119879119903 (X(119905)X

⊤(119905))2120597U(119905) minus 120597119879119903 (2X(119905) (⊙(119905)U) (U(119905))⊤)2120597U(119905)

+ 120597119879119903 ((U(119905) (⊙(119905)U)⊤) (U(119905) (⊙(119905)U)⊤)⊤)2120597U(119905)= minusX(119905) (⊙(119905)U) + U(119905) (⊙(119905)U)⊤ (⊙(119905)U)

(15)

Another way to compute 120597119891(119905)120597U(119905) is given in [39] whichhas the same result as (15) Following the works in [39] wedenote

Γ(119905) = (⊙(119905)U)⊤ (⊙(119905)U)= ((U(1))⊤U(1)) lowast sdot sdot sdot lowast ((U(119905minus1))⊤U(119905minus1))

lowast ((U(119905+1))⊤U(119905+1)) lowast sdot sdot sdot lowast ((U(119879))⊤U(119879)) (16)

6 Scientific Programming

Input X 119870 120582Output U(119905)119879119905=1(1) Initialize U(119905)119879119905=1(2) Set iter larr 1(3) repeat(4) for 119905 larr 1 to119879 do(5) Set 120578(6) Update U(119905) according to (18)(7) Normalize U(119905) according to (19)(8) end for(9) Set iter larr iter + 1(10) until L(XU(1)U(2) U(119879)) unchanged or iter

reaches the maximum iterations

Algorithm 1 The SGDClus algorithm

Therefore the partial derivative ofL is given by

120597L120597U(119905) = minusX(119905) (⊙(119905)U) + U(119905)Γ(119905) + 120582U(119905) (17)

And (12) can be rewritten as

U(119905) larr997888 U(119905) minus 120578 120597L120597U(119905)= U(119905) (I minus 120578 (Γ(119905) + 120582I)) + 120578X(119905) (⊙(119905)U) (18)

where I is an identity matrix Note that U(119905)119879119905=1 derived by(18) do not satisfy the first and second constraints in (9) Tosatisfy these two constraints we can normalize each row ofU(119905)119879119905=1 by

119906(119905)119899119896 larr997888 119906(119905)119899119896sum119870119896=1 119906(119905)119899119896 (19)

Furthermore the pseudocode of SGDClus is given inAlgorithm 1

432 SOSClus In SOSClus we apply the second-orderstochastic algorithm to the clustering problem in (9) Accord-ing to (11) each factor matrix U(119905) for 119905 = 1 2 119879 isupdated by the rule

U(119905) larr997888 U(119905) minus 120578( 1205972L1205972U(119905))minus1 120597L120597U(119905) (20)

According to (17) we can obtain the second-order partialderivative ofL with respect to U(119905) that is

1205972L1205972U(119905) = Γ(119905) + 120582I (21)

By substituting (17) and (21) into (20) we have

U(119905) larr997888 (1 minus 120578)U(119905) + 120578X(119905) (⊙(119905)U) (Γ(119905) + 120582I)minus1 (22)

Input X 119870 120582Output U(119905)119879119905=1(1) Initialize U(119905)119879119905=1(2) Set iter larr 1(3) repeat(4) for 119905 larr 1 to 119879 do(5) Set 120578(6) Compute U(119905)opt according to (23)(7) Update U(119905) according to (24)(8) Normalize U(119905) according to (19)(9) end for(10) Set iter larr iter + 1(11) until L(XU(1)U(2) U(119879)) unchanged or iter

reaches the maximum iterations

Algorithm 2 The SOSClus algorithm

Note thatX(119905)(⊙(119905)U)(Γ(119905)+120582I)minus1 is theGeneralGradient-based Optimization (OPT) [39] solution for updating U(119905) inthe regularized CP decomposition by making (17) equal tozero See the details of proof in [39] Let

U(119905)opt = X(119905) (⊙(119905)U) (Γ(119905) + 120582I)minus1 (23)

Therefore the updating rule of U(119905) in SOSClus is a weightedaverage of the current solution and the OPT solution intu-itively that is

U(119905) larr997888 (1 minus 120578)U(119905) + 120578U(119905)opt (24)

Actually SOSClus is a general extension of GeneralGradient-based Optimization (OPT) [39] and the ALS withstep size restriction in randomized block sampling method[41] In (24) when the learning rate 120578 = 1 we get the OPTsolution When the regularization parameter 120582 = 0 SOSClusbecomes the ALS with step size restriction in randomizedblock sampling method

Similar to SGDClus U(119905)119879119905=1 derived by (24) in SOSClusalso do not satisfy the first and second constraints in (9) Weshould normalize each row of U(119905)119879119905=1 according to (19) Thepseudocode of SOSClus is given in Algorithm 2

44 Feasibility Analysis

Theorem 4 TheCP decomposition ofX obtains the clusteringofmultityped objects in the heterogeneous information network119866 = (119881 119864) simultaneously

Proof Since the proofs for different types of objects in theheterogeneous information network 119866 = (119881 119864) are similarwewill simply describe the process for a single type of objectsWithout loss of generality we detail the proof on the 119905th typeof objects

Given a heterogeneous information network 119866 = (119881 119864)and its tensor representation X the nonzero elements inX represent the input gene-networks in 119866 = (119881 119864) which

Scientific Programming 7

we want to partition into 119870 clusters C1C2 C119870 Thecentre of the cluster C119896 is denoted by c119896 By using thecoordinate format [47] as the sparse representation ofX thegene-networks can be denoted as a matrix M isin R119899119899119911(X)times119879where 119899119899119911(X) is the number of nonzero elements inX Eachrow m119899 isin M 119899 = 1 2 119899119899119911(X) gives the subscripts ofcorresponding nonzero element in X In other words m119899

represents a gene-network and the entries 119898119899119905 isin m119899 119905 =1 2 119879 are the subscripts of the objects contained in thegene-network

The traditional clustering approach such as K-meansminimizes the sum of differences between individual gene-network in each cluster and the corresponding cluster centresthat is

min119901119899119896c119896

119899119899119911(X)sum119899=1

1003817100381710038171003817100381710038171003817100381710038171003817m119899 minus 119870sum119896=1

119901119899119896c11989610038171003817100381710038171003817100381710038171003817100381710038172

119865

(25)

where 119901119899119896 is the probability of gene-network m119899 belongingto the 119896th cluster Also we can rewrite the problem by anew perspective of clustering individual object in the gene-network as follows

min119901119899119905119896119888119896119905

119899119899119911(X)sum119899=1

119879sum119905=1

1003817100381710038171003817100381710038171003817100381710038171003817119898119899119905 minus119870sum119896=1

119901119899119905 11989611988811989611990510038171003817100381710038171003817100381710038171003817100381710038172

119865

(26)

where 119901119899119905 119896 is the probability of object V119905119899 belonging to the 119896thcluster

In the matrix form K-means can be formalized as

minPC

M minus PC2119865 (27)

where P is the cluster indication matrix and C is the clustercentres

By matricization of X along the 119905th mode the CPdecomposition ofX in (3) can be rewritten as

minU(1) U(2) U(119879)

100381710038171003817100381710038171003817X(119905) minus U(119905) (⊙(119905)U)⊤1003817100381710038171003817100381710038172119865 (28)

X(119905) is thematricization ofX along the 119905thmode andM is thesparse representation ofX LetU(119905) = P and let (⊙(119905)U)⊤ = CThat is U(119905) is the cluster indication matrix for the 119905th typeof objects and (⊙(119905)U)⊤ is the cluster centres So the CPdecomposition in (3) is equivalent to the K-means clusteringfor the 119905th type of objects in heterogeneous informationnetwork 119866 = (119881 119864)

By matricization of X in (3) along different modes wecan prove that the CP decomposition is equivalent to theK-means clustering for other types of objects So the CPdecomposition of X obtains the clustering of multitypedobjects in the heterogeneous information network119866 = (119881 119864)simultaneously

It is worth noting that the CP decomposition basedclustering is a soft clustering method the factor matricesindicate the probability of objects belonging to correspondingclusters The soft clustering is more in line with reality

because many objects in the heterogeneous informationnetworks may belong to several clusters In other wordsthe clusters are overlapping In some cases the overlappingclusters need to be translated into nonoverlapping clusterswhich can be achieved by using different approaches such asK-means to cluster the rows of factor matrices Usually thenonoverlapping clusters can be obtained by simply assigningeach object to the cluster which has the largest entry in thecorresponding row of factor matrix

45 Time Complexity Analysis Themain time consumptionof updating each factor matrix U(119905) in SGDClus is com-puting 120597L120597U(119905) According to (17) we need to calculateX(119905)(⊙(119905)U) and U(119905)Γ(119905) respectively Since U(119905) isin R119873119905times119870

and X(119905) isin R119873119905timesprod

119879

119894=1119894 =119905

119873119894

we have X(119905)(⊙(119905)U) isin R119873119905times119870Γ(119905) isin R119870times119870 and U(119905)Γ(119905)isinR119873119905times119870 Firstly if we successively calculate the Khatri-Rao prod-

uct of 119879 minus 1 matrices and a matrix-matrix multiplicationwhen computing X(119905)(⊙(119905)U) the intermediate results willbe of very large size and the computational cost will bevery expensive In practice we can reduce the complexityby ignoring the unnecessary calculation Let us observe theelement ofX(119905)(⊙(119905)U) that is

(X(119905) (⊙(119905)U))119899119905 119896

= sum119899119894119879

119894=1119894 =119905

(119909119899119905 prod119879119894=1119894 =119905

119899119894

119879prod119894=1119894 =119905

119906(119894)119899119894 119896) (29)

119909119899119905 prod119879119894=1119894 =119905

119899119894is an element in the matricization ofX which rep-

resents a corresponding gene-network in the heterogeneousinformation networkWhen 119909119899119905 prod119879119894=1

119894 =119905

119899119894= 0 we can ignore the

following Khatri-Rao product Hence only nonzero elementsinX need to be computedTherefore the time complexity forcomputingX(119905)(⊙(119905)U) is 119874(119899119899119911(X)119873119905119870)

Secondly the element of U(119905)Γ(119905) is given by

(U(119905)Γ(119905))119899119905 119896

= 119870sum119895=1

(119906(119905)119899119905 119895 119879prod119894=1119894 =119905

119873119894sum119899119894=1

119906(119894)119899119894 119895119906(119894)119899119894 119896) (30)

So the time complexity of computing U(119905)Γ(119905) is 119874((119873 minus119873119905)1198702) where 119873 = sum119879119905=1119873119905 is the total number of objectsin the networks

Above all the time complexity of each iteration inSGDClus is 119874(119899119899119911(X)119873119870 + (119879 minus 1)1198731198702) Note that in thereal-world heterogeneous information networks the numberof clusters119870 and the number of object types 119879 are usually farless than119873 that is119870 ≪ 119873 and 119879 ≪ 119873

According to (22) the time complexity of updatingeach factor matrix U(119905) in SOSClus is composed of threecomponents X(119905)(⊙(119905)U) (Γ(119905) + 120582I)minus1 and the product ofthem Compared to SGDClus only the time consumption ofcomputing an inverse matrix for (Γ(119905) + 120582I) is additional in

8 Scientific Programming

SOSClus Since (Γ(119905)+120582I) is a119870times119870 squarematrix computingthe inverse of such matrix costs 119874(1198703) Nevertheless 119874(1198703)is usually negligible because119870 ≪ 119873

5 Experiments and Results

In this section we present several experiments on syntheticand real-world datasets for heterogeneous information net-works and compare the performance with a number of state-of-the-art clustering methods

51 Evaluation Metrics and Experimental Setting

511 Evaluation Metrics In the experiments we adopt theNormalized Mutual Information (NMI) [48] and Accuracy(AC) as our performance measurements

NMI is used to measure the mutual dependence infor-mation between the clustering result and the ground truthGiven 119873 objects 119870 clusters one clustering result andthe ground truth classes for the objects let 119899(119894 119895) 119894 119895 =1 2 119870 be the number of objects that labeled 119894 in clus-tering result but labeled 119895 in the ground truth The jointdistribution can be defined as119901(119894 119895) = 119899(119894 119895)119873 themarginaldistribution of rows can be calculated as 1199011(119895) = sum119870119894=1 119901(119894 119895)and themarginal distribution of columns can be calculated as1199012(119894) = sum119870119895=1 119901(119894 119895) Then the NMI is defined as

NMI = sum119870119894=1sum119870119895=1 119901 (119894 119895) log (119901 (119894 119895) 1199011 (119895) 1199012 (119894))radicsum119870119895=1 1199011 (119895) log1199011 (119895)sum119870119894=1 1199012 (119894) log1199012 (119894) (31)

The NMI ranges from 0 to 1 the larger value of NMI thebetter the clustering result

AC is used to compute the clustering accuracy thatmeasures the percent of the correct clustering result AC isdefined as

AC = sum119879119905=1sum119873119905119899=1 120575 (map (120592119905119899) label (120592119905119899))sum119879119905=1119873119905 (32)

where map(120592119905119899) is the cluster label of the object 120592119905119899 and thelabel(120592119905119899) is the ground truth class of the object 120592119905119899 And 120575(sdot)is an indicator function

120575 (sdot) = 1 if map (120592119905119899) = label (120592119905119899) 0 if map (120592119905119899) = label (120592119905119899) (33)

Since both ofNMI andACare used tomeasure the perfor-mance of clustering one type of object the weighted averageNMI and AC are also used to measure the performance ofSTFClus and other state-of-the-art methods

NMI = sum119879119905=1119873119905 (NMI)119905sum119879119905=1119873119905 AC = sum119879119905=1119873119905 (AC)119905sum119879119905=1119873119905 (34)

Table 1 The synthetic datasets

Syntheticdatasets 119879 119870 119878 119863Syn1 2 2 1119872 = 1000 times 1000 01Syn2 2 4 10119872 = 1000 times 10000 001Syn3 4 2 100119872 = 100 times 100 times 100 times 100 01Syn4 4 4 1000119872 = 100 times 100 times 100 times 1000 001119879 is the number of object types in the heterogeneous information networkand also the number of modes in the tensor119870 is the number of clusters 119878 isthe network scale and 119878 = 1198731 times1198732 times sdot sdot sdot times119873119879119863 is the density of the tensorthat is the percentage of nonzero elements in the tensor and119863 = 119899119899119911(X)119878

512 Experimental Setting In order to compare the perfor-mance of our proposed SGDClus and SOSClus with othersimpartially all methods share a common stopping conditionthat is 1003816100381610038161003816Liter minusLiterminus1

1003816100381610038161003816Literminus1

le 10minus6 (35)

or iter reaches the maximum iterations Liter and Literminus1are the values of function L at the current that is (iter)thiteration and the previous that is (iter minus 1)th iterationrespectively And we set the maximum iterations to be 1000Throughout the experiments the regularization parameter 120582in SGDClus and SOSClus is fixed as 120582 = 0001

All experiments are implemented in theMATLABR2015a(version 850) 64-bit And the MATLAB Tensor Toolbox(version 26 httpwwwsandiagovsimtgkoldaTensorTool-box) is used in our experiments Since the heterogeneousinformation networks are often sparse in real-world scenar-ios that is most elements in the tensor X are zeros we usethe sparse format of X as proposed in [47] which has beensupported by MATLAB Tensor Toolbox The experimentalresults are the average values obtained by running thealgorithms ten times on corresponding datasets

52 Experiments on Synthetic Datasets

521 The Synthetic Datasets Description The purposeof using synthetic datasets is to examine whether theproposed tensor CP decomposition clustering frameworkcan work well since the detailed cluster structures of thesynthetic datasets are known In order to make the syntheticdatasets similar to a realistic situation we assume thatthe distribution for different types of objects that appearin a gene-network follows Zipf rsquos law (see details onlinehttpsenwikipediaorgwikiZipf rsquos_law) Zipf rsquos law isdefined by119891(119903 120588119873) = 119903minus120588sum119873119899=1 119899minus120588 where119873 is the numberof objects 119903 is the object index and 120588 is the parameter charac-terizing the distribution Zipf rsquos law denotes the frequency ofthe 119903th object appearing in the gene-networkWe set 120588 = 095and generate 4 synthetic datasets with different parametersThe details of these synthetic datasets are shown in Table 1

522 Experimental Results In the beginning of the experi-ments we set a common learning rate 120578 = 1(iter + 1) for

Scientific Programming 9

SGDClus on synthetic datasets

0 10 20 30 40 50

Iteration

Erro

r

Syn1Syn2

Syn3Syn4

000

005

010

015

020

025

030

035

SOSClus on synthetic datasets

0 10 20 30 40 50

Iteration

Syn1Syn2

Syn3Syn4

00

01

02

03

04

05

06

07

08

Erro

r

Figure 3 Convergence speed of SGDClus and SOSClus on the 4 synthetic datasets with a common learning rate that is 120578 = 1(iter + 1)SGDClus and SOSClus We find that SOSClus has a fasterconvergence speed and better robustness with respect to thelearning rate which is clearly shown in Figure 3 AlthoughSGDClus may eventually converge to a local minimum theefficiency of the optimization near a local minimum is not allroses As shown in Figure 3 the solutions of SGDClus swingaround a local minimum This phenomenon proves that theconvergence speed of SGDClus is sensitive to the choice oflearning rate 120578

Then we modify the learning rate to 120578 = 1(iter +119888) for SGDClus where 119888 is a constant optimized in theexperiments In practice 119888 = 27855 for SGDClus runningon Syn3 and 119888 = 430245 for SGDClus running on Syn4The performance comparison of SGDClus with learning rate120578 = 1(iter + 119888) and SOSClus with learning rate 120578 = 1(iter +1) on Syn3 and Syn4 is shown in Figure 4 By employingthe optimized learning rate SGDClus converges to a localminimumquicklyHowever compared to SOSClus SGDClusstill has no advantageThe hand drawing blue circles over thecurve of SOSClus in Figure 4 shows that SOSClus can escapefrom a local minimum and find the global minimum whileSGDClus just obtains the first reaching local minimum

According to (23) and (24) we accessorily obtain the solu-tions of OPT by running SOSClus So we compare the ACand NMI of OPT SGDClus and SOSClus on the 4 syntheticdatasets which are shown in Figure 5 With the increase ofobject types in the heterogeneous information networks theAC and NMI of SOSClus and OPT increase distinctly whileperformance of SGDClus almost does not changeWhen 119879 =2 AC and NMI of these three methods on Syn1 and Syn2are almost equal and low However AC and NMI of SOSClusincrease to 1 when 119879 = 4 Since the histograms of OPT SGD-Clus and SOSClus on Syn1 and Syn2 are almost the same andon Syn3 and Syn4 are also similar we know that the param-eters 119870 and 119878 have no significant effect on the performanceGenerally the larger density119863 and the number of object types119879 in the network result in higher AC and NMI of SOSClus

Obviously in the experiments on the 4 synthetic datasetsSOSClus shows an excellent performance SOSClus has afaster convergence speed and better robustness with respectto the learning rate Meanwhile SOSClus performs better onAC and NMI because it can escape from a local minimumand find the global minimum

53 Performance Comparison on Real-World Dataset

531 Real-World Dataset Description The experiments onthe real-world dataset are used to compare the performanceof the tensor CP decomposition clustering framework withother state-of-the-art methods

The real-world dataset extracted from the DBLP databaseis the DBLP-four-area dataset which can be downloadedfrom httpwebcsuclaedusimyzsundataDBLP_four_areazip It is a four research-area subset of DBLP and is usedin [2 3 12 13 15 16 18] The four research areas in DBLP-four-area dataset are database (DB) data mining (DM)machine learning (ML) and information retrieval (IR)respectively There are five representative conferences ineach area And all related authors papers published in theseconferences and terms contained in these papersrsquo titlesare included The DBLP-four-area dataset contains 14376papers with 100 labeled 14475 authors with 4057 labeled20 labeled conferences and 8920 terms The density of theDBLP-four-area dataset is 901935 times 10minus9 so we construct a4-mode tensor with size of 14376 times 14475 times 20 times 8920 and334832 nonzero elements We compare the performance oftensor CP decomposition clustering framework with severalother methods on the labeled record in this dataset

532 Comparative Methods

(i) NetClus (see [2]) An extended version of RankClus [1]which can deal with the network follows the star networkschema The time complexity of NetClus for clustering each

10 Scientific Programming

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

08

Erro

rExperiments on Syn3

SGDClusSOSClus

Escaped from a local minimum

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

Erro

r

Experiments on Syn4

SGDClusSOSClus

Escaped from a local minimum

Figure 4 Convergence speed of SGDClus and SOSClus on Syn3 and Syn4 with different learning rate The learning rate for SOSClus is120578 = 1(iter + 1) while that for SGDClus is 120578 = 1(iter + 119888) where 119888 is a constant optimized in the experiments

OPTSGDClusSOSClus

Synthetic datasets

00

02

04

06

08

10

AC

Syn1 Syn2 Syn4Syn3

OPTSGDClusSOSClus

NM

I

00

02

04

06

08

10

Synthetic datasetsSyn1 Syn2 Syn4Syn3

Figure 5 The AC and NMI of OPT SGDClus and SOSClus on the 4 synthetic datasets

object type in each iteration is 119874(119870|119864| + (1198702 + 119870)119873) where119870 is the number of clusters |119864| is the number of edges in thenetwork and119873 is the total number of objects in the network

(ii) PathSelClus (see [15 16]) A clustering method based onthe predefined metapath requires a user guide In PathSel-Clus the distance between the same type objects is measuredby PathSim [3] and themethod starts with the given seeds byuser The time complexity of PathSelClus for clustering eachobject type in each iteration is 119874((119870 + 1)|P| + 119870119873) where|P| is the number of metapath instances in the networkAnd the time complexity of PathSim used by PathSelClus forclustering each object type is 119874(119873119889) where 119889 is the averagedegree of objects

(iii) FctClus (see [13]) It is a recently proposed clusteringmethod for heterogeneous information networks As withNetClus the FctClus method can deal with networks follow-ing the star network schema The time complexity of FctClusfor clustering each object type in each iteration is 119874(119870|119864| +119879119870119873)533 Experimental Results As the baseline methods canonly deal with specific schema heterogeneous informationnetworks here we must construct different subnetworks forthem For NetClus and FctClus the network is organized as astar network schema like in [2 13] where the paper (P) is thecentre type and author (A) conference (C) and term (T) are

Scientific Programming 11

Table 2 AC of experiments on DBLP-four-area dataset

AC OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 05882 08476 09007 07154 07551 07887Author 05872 08486 09486 07177 07951 08008Conference 1 099 1 09172 09950 09031avg (AC) 05892 08493 09477 07186 07951 08010

Table 3 NMI of experiments on DBLP-four-area dataset

NMI OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 06557 06720 08812 05402 06142 07152Author 06539 08872 08822 05488 06770 06012Conference 1 08497 1 08858 09906 08248avg (NMI) 06556 08778 08827 05503 06770 06050

Table 4 Running time of experiments on DBLP-four-area dataset

Running time (s) OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper mdash mdash mdash 8026 5423 8084Author mdash mdash mdash 7437 6811 7749Conference mdash mdash mdash 6584 6293 6698Total time 6726 4324 8184 22047 18527 22531

the attribute types For PathSelClus we select the metapathof P-T-P A-P-C-P-A and C-P-T-P-C to cluster the papersauthors and conferences respectively And in PathSelCluswe give each cluster one seed to start

Wemodel theDBLP-four-area dataset as a 4-mode tensorwhere each mode represents one object type The 4 modesare author (A) paper (P) conference (C) and term (T)respectively Actually the sequence of the object types isinsignificant And each element of the tensor represents agene-network in the heterogeneous information network Inthe experiments we set the learning rate for SOSClus to be120578 = 1(iter+1) and an optimized learning rate 120578 = 1(iter+119888)with 119888 = 1000125 for SGDClus By running SOSClus weaccessorily obtain the solutions of OPT So we compare theexperimental results of OPT SGDClus and SOSClus on theDBLP-four-area dataset with the three baseline methods Seethe details in Tables 2 3 and 4

In Tables 2 and 3 SOSClus performs best on average ACand NMI and SGDClus takes the second place All methodsachieve satisfactory AC and NMI on the conference sincethere are only 20 conferences in the network SGDClus takesthe shortest running time and OPT and SOSClus have anobvious advantage on running time compared with otherbaselinesThe time complexity of SGDClus is119874(119899119899119911(X)119873119870+(119879 minus 1)1198702119873) and the time complexity of SOSClus is119874(119899119899119911(X)119873119870+(119879minus1)1198702119873+1198703) where 119899119899119911(X) is the num-ber of nonzero elements in X that is the number of gene-networks in the heterogeneous information network Wehave 119899119899z(X) lt |P| ≪ |119864| 119870 ≪ 119873 and 119879 ≪ 119873 Comparedwith the time complexity of the three baselines SGDClus andSOSClus have a little disadvantage However it is worth not-ing that the three baselines can only cluster one type of objectsin the network in each running while OPT SGDClus and

SOSClus can obtain the clusters of all types of objects simulta-neously by running onceThis is the reasonwhy only the totaltime is shown for OPT SGDClus and SOSClus in Table 4Moreover the total time of OPT SGDClus and SOSClusis on the same order of magnitude as the running time ofother baselines for clustering each type of objects which isconsistent with the comparison of the time complexity

6 Conclusion

In this paper a tensor CP decomposition method for clus-tering heterogeneous information networks is presented Intensor CP decomposition clustering framework each type ofobjects in heterogeneous information network is modeled asone mode of tensor and the gene-networks in the networkare modeled as the elements in tensor In other wordstensor CP decomposition clustering framework can modeldifferent types of objects and semantic relations in the het-erogeneous information network without the restriction ofnetwork schema In addition two stochastic gradient descentalgorithms named SGDClus and SOSClus are designedSGDClus and SOSClus can cluster all types of objects andthe gene-networks simultaneously by running once Theproposed algorithms outperformed other state-of-the-artclustering methods in terms of AC NMI and running time

Notations119886 A scalar (lowercase letter)a A vector (boldface lowercase letter)A A matrix (boldface capital letter)X A tensor (calligraphic letter)

12 Scientific Programming

1199091198941 1198942119894119873 The (1198941 1198942 119894119873)th element of an 119873th-order tensorX

X(119899) Matricization ofX along the 119899th modelowast Hadamard product (element-wise prod-uct) of two tensors (or matrices or vectors)with the same dimension⊙ Khatri-Rao product of two matrices sdot 119865 Frobenius norm of a tensor (or matrices orvectors)

a ∘ b Outer product of two vectors

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This study was supported by the National Natural ScienceFoundation of China (no 61401482 and no 61401483)

References

[1] Y Sunt J Hant P Zhao Z Yin H Cheng and T WuldquoRankClus integrating clustering with ranking for heteroge-neous information network analysisrdquo in Proceedings of the 12thInternational Conference on Extending Database TechnologyAdvances inDatabase Technology (EDBT rsquo09) pp 565ndash576 SaintPetersburg Russia March 2009

[2] Y Sun Y Yu and J Han ldquoRanking-based clustering of hetero-geneous information networks with star network schemardquo inProceedings of the 15th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo09) pp 797ndash805 Paris France July 2009

[3] Y Sun J Han X Yan P S Yu and TWu ldquoPathSim meta path-based top-K similarity search in heterogeneous informationnetworksrdquo PVLDB vol 4 no 11 pp 992ndash1003 2011

[4] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability Volume1 Statistics pp 281ndash297 University of California Press BerkeleyCalif USA 1967

[5] S Guha R Rastogi and K Shim ldquoCure an efficient clusteringalgorithm for large databasesrdquo Information Systems vol 26 no1 pp 35ndash58 2001

[6] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the 2nd International Conference onKnowledge Discovery and Data Mining (KDD-96) E SimoudisJ Han and U M Fayyad Eds pp 226ndash231 AAAI PressPortland Ore USA 1996

[7] W Wang J Yang and R R Muntz ldquoSTING a statisticalinformation grid approach to spatial data miningrdquo in Pro-ceedings of the 23rd International Conference on Very LargeData Bases (VLDB rsquo97) M Jarke M J Carey K R DittrichF H Lochovsky P Loucopoulos and M A Jeusfeld Edspp 186ndash195 Morgan Kaufmann Athens Greece August 1997httpwwwvldborgconf1997P186PDF

[8] E H Ruspini ldquoNew experimental results in fuzzy clusteringrdquoInformation Sciences vol 6 no 73 pp 273ndash284 1973

[9] U von Luxburg ldquoA tutorial on spectral clusteringrdquo Statistics andComputing vol 17 no 4 pp 395ndash416 2007

[10] C Shi Y Li J Zhang Y Sun and P S Yu ldquoA survey of hetero-geneous information network analysisrdquo IEEE Transactions onKnowledge and Data Engineering vol 29 no 1 pp 17ndash37 2017

[11] J Han Y Sun X Yan and P S Yu ldquoMining knowledge fromdata an information network analysis approachrdquo in Proceedingsof the IEEE 28th International Conference on Data Engineering(ICDE rsquo12) pp 1214ndash1217 IEEE Washington DC USA April2012

[12] J Chen W Dai Y Sun and J Dy ldquoClustering and rankingin heterogeneous information networks via gamma-poissonmodelrdquo in Proceedings of the SIAM International Conference onData Mining (SDM rsquo15) pp 424ndash432 May 2015

[13] J Yang L Chen and J Zhang ldquoFctClus a fast clustering algo-rithm for heterogeneous information networksrdquoPLoSONE vol10 no 6 Article ID e0130086 2015

[14] C Shi R Wang Y Li P S Yu and B Wu ldquoRanking-basedclustering on general heterogeneous information networks bynetwork projectionrdquo in Proceedings of the 23rd ACM Interna-tional Conference on Conference on Information and KnowledgeManagement (CIKM rsquo14) pp 699ndash708 ACM Shanghai ChinaNovember 2014

[15] Y Sun B Norick J Han X Yan P S Yu and X Yu ldquoInte-grating meta-path selection with user-guided object clusteringin heterogeneous information networksrdquo in Proceedings of the18th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo12) pp 1348ndash1356 ACMBeijing China August 2012

[16] Y Sun B Norick J Han X Yan P S Yu and X YuldquoPathSelClus integrating meta-path selection with user-guidedObject clustering in heterogeneous information networksrdquoACMTransactions onKnowledgeDiscovery fromData vol 7 no3 pp 723ndash724 2013

[17] X Yu Y Sun B Norick T Mao and J Han ldquoUser guided entitysimilarity search using meta-path selection in heterogeneousinformation networksrdquo in Proceedings of the 21st ACM Interna-tional Conference on Information and Knowledge Management(CIKM rsquo12) pp 2025ndash2029 ACM November 2012

[18] Y Sun C C Aggarwal and J Han ldquoRelation strength-awareclustering of heterogeneous information networks with incom-plete attributesrdquoProceedings of the VLDBEndowment vol 5 no5 pp 394ndash405 2012

[19] M Zhang H Hu Z He andWWang ldquoTop-k similarity searchin heterogeneous information networks with x-star networkschemardquo Expert Systems with Applications vol 42 no 2 pp699ndash712 2015

[20] Y Zhou and L Liu ldquoSocial influence based clustering ofheterogeneous information networksrdquo in Proceedings of the19th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 338ndash346 ACM Chicago IllUSA August 2013

[21] L R Tucker ldquoSome mathematical notes on three-mode factoranalysisrdquo Psychometrika vol 31 pp 279ndash311 1966

[22] R A Harshman ldquoFoundations of the parafac procedure modeland conditions for an lsquoexplanatoryrsquo multi-mode factor analysisrdquoUCLAWorking Papers in Phonetics 1969

[23] H A L Kiers ldquoTowards a standardized notation and terminol-ogy in multiway analysisrdquo Journal of Chemometrics vol 14 no3 pp 105ndash122 2000

[24] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

Scientific Programming 13

[25] A Cichocki D Mandic L De Lathauwer et al ldquoTensordecompositions for signal processing applications from two-way to multiway component analysisrdquo IEEE Signal ProcessingMagazine vol 32 no 2 pp 145ndash163 2015

[26] W Peng and T Li ldquoTensor clustering via adaptive subspaceiterationrdquo Intelligent Data Analysis vol 15 no 5 pp 695ndash7132011

[27] J Hastad ldquoTensor rank is NP-completerdquo Journal of AlgorithmsCognition Informatics and Logic vol 11 no 4 pp 644ndash6541990

[28] S Metzler and P Miettinen ldquoClustering Boolean tensorsrdquo DataMining amp Knowledge Discovery vol 29 no 5 pp 1343ndash13732015

[29] X Cao X Wei Y Han and D Lin ldquoRobust face clustering viatensor decompositionrdquo IEEE Transactions on Cybernetics vol45 no 11 pp 2546ndash2557 2015

[30] I Sutskever R Salakhutdinov and J B Tenenbaum ldquoModellingrelational data using Bayesian clustered tensor factorizationrdquoin Proceedings of the 23rd Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo09) pp 1821ndash1828 BritishColumbia Canada December 2009

[31] B Ermis E Acar and A T Cemgil ldquoLink prediction in het-erogeneous data via generalized coupled tensor factorizationrdquoDataMining amp Knowledge Discovery vol 29 no 1 pp 203ndash2362015

[32] A R Benson D F Gleiche and J Leskovec ldquoTensor spectralclustering for partitioning higher-order network structuresrdquoin Proceedings of the SIAM International Conference on DataMining (SDM rsquo15) pp 118ndash126 Vancouver Canada May 2015

[33] L Xiong X Chen T-K Huang J G Schneider and JG Carbonell ldquoTemporal collaborative filtering with Bayesianprobabilistic tensor factorizationrdquo in Proceedings of the 10thSIAM International Conference on Data Mining (SDM rsquo10) pp211ndash222 Columbus Ohio USA May 2010

[34] E E Papalexakis L Akoglu and D Ience ldquoDo more views ofa graph help Community detection and clustering in multi-graphsrdquo in Proceedings of the 16th International Conferenceof Information Fusion (FUSION rsquo13) pp 899ndash905 IstanbulTurkey July 2013

[35] M Vandecappelle M Bousse F V Eeghem and L D Lath-auwer ldquoTensor decompositions for graph clusteringrdquo Inter-nal Report 16-170 ESAT-STADIUS KU Leuven LeuvenBelgium 2016 ftpftpesatkuleuvenbepubSISTAsistakul-akreports2016_Tensor_Graph_Clusteringpdf

[36] W Shao L He and P S Yu ldquoClustering on multi-sourceincomplete data via tensor modeling and factorizationrdquo inProceedings of the 19th Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD rsquo15) pp 485ndash497 Ho ChiMinh City Vietnam 2015

[37] J D Carroll and J-J Chang ldquoAnalysis of individual differencesin multidimensional scaling via an n-way generalization ofldquoEckart-Youngrdquo decompositionrdquo Psychometrika vol 35 no 3pp 283ndash319 1970

[38] L De Lathauwer B De Moor and J Vandewalle ldquoOn thebest rank-1 and rank-(R1 R2 RN) approximation ofhigher-order tensorsrdquo SIAM Journal on Matrix Analysis andApplications vol 21 no 4 pp 1324ndash1342 2000

[39] E Acar D M Dunlavy and T G Kolda ldquoA scalable opti-mization approach for fitting canonical tensor decompositionsrdquoJournal of Chemometrics vol 25 no 2 pp 67ndash86 2011

[40] S Hansen T Plantenga and T G Kolda ldquoNewton-basedoptimization for Kullback-Leibler nonnegative tensor factor-izationsrdquo Optimization Methods amp Software vol 30 no 5 pp1002ndash1029 2015

[41] NVervliet and LDe Lathauwer ldquoA randomized block samplingapproach to canonical polyadic decomposition of large-scaletensorsrdquo IEEE Journal on SelectedTopics in Signal Processing vol10 no 2 pp 284ndash295 2016

[42] R Ge F Huang C Jin and Y Yuan ldquoEscaping from saddlepointsmdashonline stochastic gradient for tensor decompositionrdquohttparxivorgabs150302101

[43] T G Kolda ldquoMultilinear operators for higher-order decompo-sitionsrdquo Tech Rep SAND2006-2081 Sandia National Labora-tories 2006 httpwwwostigovscitechbiblio923081=0pt

[44] P Paatero ldquoA weighted non-negative least squares algorithmfor three-way lsquoPARAFACrsquo factor analysisrdquo Chemometrics ampIntelligent Laboratory Systems vol 38 no 2 pp 223ndash242 1997

[45] P Paatero ldquoConstruction and analysis of degenerate PARAFACmodelsrdquo Journal of Chemometrics vol 14 no 3 pp 285ndash2992000

[46] B L Bottou and N Murata ldquoStochastic approximations andefficient learningrdquo inTheHandbook of BrainTheory and NeuralNetworks 2nd edition 2002

[47] B W Bader and T G Kolda ldquoEfficient MATLAB computationswith sparse and factored tensorsrdquo SIAM Journal on ScientificComputing vol 30 no 1 pp 205ndash231 2007

[48] A Strehl and J Ghosh ldquoCluster ensemblesmdasha knowledgereuse framework for combining multiple partitionsrdquo Journal ofMachine Learning Research vol 3 no 3 pp 583ndash617 2003

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: A Tensor CP Decomposition Method for Clustering ...downloads.hindawi.com/journals/sp/2017/2803091.pdf · spectral clustering, are designed to analyze homogeneous informationnetworks

Scientific Programming 5

norms of all factor matrices in the optimization be equal thatis 10038171003817100381710038171003817U(1)10038171003817100381710038171003817119865 = 10038171003817100381710038171003817U(2)10038171003817100381710038171003817119865 = sdot sdot sdot = 10038171003817100381710038171003817U(119879)10038171003817100381710038171003817119865 (8)

Therefore the local minima of loss function L becomeisolated and any replacement and scaling of the satisfactorysolutions will escape from the optimization The detailsof proof can be found in [39] Meanwhile the Tikhonovregularization term can ensure the sparsity of the factormatrices by penalizing the number of nonzero elements

Therefore the tensor CP decomposition method forclustering heterogeneous information networks can be for-malized as

minU(1) U(2) U(119879)

L (XU(1)U(2) U(119879))st 119870sum

119896=1

119906(119905)119899119896 = 1 forall119905 forall119899119906(119905)119899119896 isin [0 1] forall119905 forall119899 forall119896119873119905sum119899=1

119906(119905)119899119896 gt 0 forall119905 forall119896(9)

where 119899 = 1 2 119873119905 119905 = 1 2 119879 119896 = 1 2 119870and 119870 lt min 1198731 1198732 119873119879 is the total number ofclusters In (9) we divide X into 119870 clusters and obtain thestructure of each cluster which includes the distribution ofeach objectThe first constraint in (9) guarantees that the sumof probabilities for each object belonging to all clusters is 1The second constraint in (9) represents that each probabilityshould be in the range of [0 1] The last constraint in (9)makes sure that there is no empty cluster for any mode

43 The Stochastic Gradient Descent Algorithms Stochasticgradient descent is a mature and widely used tool foroptimizing various models in machine learning such as arti-ficial neural networks support vector machines and logisticregression In this section the regularized clustering problemin (9) will be addressed by the stochastic gradient descentalgorithms The details of tensor algebra and properties usedin this section can be found in [43]

First we review the stochastic gradient descent algorithmTo solve an optimization problem min119909119876(119909) where 119876(119909) isa differentiable object function to be minimized and 119909 is avariable the stochastic gradient descent method to update 119909can be described as

119909 larr997888 119909 minus 120578nabla119876 (119909) (10)

where 120578 is a positive number named learning rate or stepsize The convergence speed of stochastic gradient descentalgorithm depends on the choice of learning rate 120578 and initialsolution

Though the stochastic gradient descent algorithm mayconverge to a local minimum at a linear speed the efficiencyof the algorithmnear the optimal point is not all roses [46] To

speed up the final optimization phase an extension methodnamed second-order stochastic algorithm is designed in [46]which replaces the learning rate 120578 by the inverse of second-order derivative of the object function that is

119909 larr997888 119909 minus 120578 (nabla2119876 (119909))minus1 nabla119876 (119909) (11)

Now we apply the stochastic gradient descent and thesecond-order stochastic algorithm to the clustering prob-lem in (9) and propose two algorithms named SGDClus(Stochastic Gradient Descent for Clustering) and SOSClus(Second-Order Stochastic for Clustering) respectively

431 SGDClus In SGDClus we apply the stochastic gradientdescent algorithm to the clustering problem in (9) Accordingto (10) each factor matrix U(119905) for 119905 = 1 2 119879 is updatedby the rule

U(119905) larr997888 U(119905) minus 120578 120597L120597U(119905) = U(119905) minus 120578( 120597119891120597U(119905) + 120597119892120597U(119905)) (12)

Actually 120597119892120597U(119905) is easy to be obtained according to (6)that is 120597119892120597U(119905) = 120582U(119905) (13)

To compute 120597119891120597U(119905) 119891(XU(1)U(2) U(119879)) can berewritten by matricization ofX along the 119905th mode as

119891(119905) = 12 100381710038171003817100381710038171003817X(119905) minus U(119905) (⊙(119905)U)⊤1003817100381710038171003817100381710038172119865 (14)

where ⊙(119905)U = U(119879) ⊙ sdot sdot sdot ⊙ U(119905+1) ⊙ U(119905minus1) ⊙ sdot sdot sdot ⊙ U(1) Thenwe have120597119891(119905)120597U(119905)

= 120597119879119903 ((X(119905) minus U(119905) (⊙(119905)U)⊤) (X(119905) minus U(119905) (⊙(119905)U)⊤)⊤)2120597U(119905)= 120597119879119903 (X(119905)X

⊤(119905))2120597U(119905) minus 120597119879119903 (2X(119905) (⊙(119905)U) (U(119905))⊤)2120597U(119905)

+ 120597119879119903 ((U(119905) (⊙(119905)U)⊤) (U(119905) (⊙(119905)U)⊤)⊤)2120597U(119905)= minusX(119905) (⊙(119905)U) + U(119905) (⊙(119905)U)⊤ (⊙(119905)U)

(15)

Another way to compute 120597119891(119905)120597U(119905) is given in [39] whichhas the same result as (15) Following the works in [39] wedenote

Γ(119905) = (⊙(119905)U)⊤ (⊙(119905)U)= ((U(1))⊤U(1)) lowast sdot sdot sdot lowast ((U(119905minus1))⊤U(119905minus1))

lowast ((U(119905+1))⊤U(119905+1)) lowast sdot sdot sdot lowast ((U(119879))⊤U(119879)) (16)

6 Scientific Programming

Input X 119870 120582Output U(119905)119879119905=1(1) Initialize U(119905)119879119905=1(2) Set iter larr 1(3) repeat(4) for 119905 larr 1 to119879 do(5) Set 120578(6) Update U(119905) according to (18)(7) Normalize U(119905) according to (19)(8) end for(9) Set iter larr iter + 1(10) until L(XU(1)U(2) U(119879)) unchanged or iter

reaches the maximum iterations

Algorithm 1 The SGDClus algorithm

Therefore the partial derivative ofL is given by

120597L120597U(119905) = minusX(119905) (⊙(119905)U) + U(119905)Γ(119905) + 120582U(119905) (17)

And (12) can be rewritten as

U(119905) larr997888 U(119905) minus 120578 120597L120597U(119905)= U(119905) (I minus 120578 (Γ(119905) + 120582I)) + 120578X(119905) (⊙(119905)U) (18)

where I is an identity matrix Note that U(119905)119879119905=1 derived by(18) do not satisfy the first and second constraints in (9) Tosatisfy these two constraints we can normalize each row ofU(119905)119879119905=1 by

119906(119905)119899119896 larr997888 119906(119905)119899119896sum119870119896=1 119906(119905)119899119896 (19)

Furthermore the pseudocode of SGDClus is given inAlgorithm 1

432 SOSClus In SOSClus we apply the second-orderstochastic algorithm to the clustering problem in (9) Accord-ing to (11) each factor matrix U(119905) for 119905 = 1 2 119879 isupdated by the rule

U(119905) larr997888 U(119905) minus 120578( 1205972L1205972U(119905))minus1 120597L120597U(119905) (20)

According to (17) we can obtain the second-order partialderivative ofL with respect to U(119905) that is

1205972L1205972U(119905) = Γ(119905) + 120582I (21)

By substituting (17) and (21) into (20) we have

U(119905) larr997888 (1 minus 120578)U(119905) + 120578X(119905) (⊙(119905)U) (Γ(119905) + 120582I)minus1 (22)

Input X 119870 120582Output U(119905)119879119905=1(1) Initialize U(119905)119879119905=1(2) Set iter larr 1(3) repeat(4) for 119905 larr 1 to 119879 do(5) Set 120578(6) Compute U(119905)opt according to (23)(7) Update U(119905) according to (24)(8) Normalize U(119905) according to (19)(9) end for(10) Set iter larr iter + 1(11) until L(XU(1)U(2) U(119879)) unchanged or iter

reaches the maximum iterations

Algorithm 2 The SOSClus algorithm

Note thatX(119905)(⊙(119905)U)(Γ(119905)+120582I)minus1 is theGeneralGradient-based Optimization (OPT) [39] solution for updating U(119905) inthe regularized CP decomposition by making (17) equal tozero See the details of proof in [39] Let

U(119905)opt = X(119905) (⊙(119905)U) (Γ(119905) + 120582I)minus1 (23)

Therefore the updating rule of U(119905) in SOSClus is a weightedaverage of the current solution and the OPT solution intu-itively that is

U(119905) larr997888 (1 minus 120578)U(119905) + 120578U(119905)opt (24)

Actually SOSClus is a general extension of GeneralGradient-based Optimization (OPT) [39] and the ALS withstep size restriction in randomized block sampling method[41] In (24) when the learning rate 120578 = 1 we get the OPTsolution When the regularization parameter 120582 = 0 SOSClusbecomes the ALS with step size restriction in randomizedblock sampling method

Similar to SGDClus U(119905)119879119905=1 derived by (24) in SOSClusalso do not satisfy the first and second constraints in (9) Weshould normalize each row of U(119905)119879119905=1 according to (19) Thepseudocode of SOSClus is given in Algorithm 2

44 Feasibility Analysis

Theorem 4 TheCP decomposition ofX obtains the clusteringofmultityped objects in the heterogeneous information network119866 = (119881 119864) simultaneously

Proof Since the proofs for different types of objects in theheterogeneous information network 119866 = (119881 119864) are similarwewill simply describe the process for a single type of objectsWithout loss of generality we detail the proof on the 119905th typeof objects

Given a heterogeneous information network 119866 = (119881 119864)and its tensor representation X the nonzero elements inX represent the input gene-networks in 119866 = (119881 119864) which

Scientific Programming 7

we want to partition into 119870 clusters C1C2 C119870 Thecentre of the cluster C119896 is denoted by c119896 By using thecoordinate format [47] as the sparse representation ofX thegene-networks can be denoted as a matrix M isin R119899119899119911(X)times119879where 119899119899119911(X) is the number of nonzero elements inX Eachrow m119899 isin M 119899 = 1 2 119899119899119911(X) gives the subscripts ofcorresponding nonzero element in X In other words m119899

represents a gene-network and the entries 119898119899119905 isin m119899 119905 =1 2 119879 are the subscripts of the objects contained in thegene-network

The traditional clustering approach such as K-meansminimizes the sum of differences between individual gene-network in each cluster and the corresponding cluster centresthat is

min119901119899119896c119896

119899119899119911(X)sum119899=1

1003817100381710038171003817100381710038171003817100381710038171003817m119899 minus 119870sum119896=1

119901119899119896c11989610038171003817100381710038171003817100381710038171003817100381710038172

119865

(25)

where 119901119899119896 is the probability of gene-network m119899 belongingto the 119896th cluster Also we can rewrite the problem by anew perspective of clustering individual object in the gene-network as follows

min119901119899119905119896119888119896119905

119899119899119911(X)sum119899=1

119879sum119905=1

1003817100381710038171003817100381710038171003817100381710038171003817119898119899119905 minus119870sum119896=1

119901119899119905 11989611988811989611990510038171003817100381710038171003817100381710038171003817100381710038172

119865

(26)

where 119901119899119905 119896 is the probability of object V119905119899 belonging to the 119896thcluster

In the matrix form K-means can be formalized as

minPC

M minus PC2119865 (27)

where P is the cluster indication matrix and C is the clustercentres

By matricization of X along the 119905th mode the CPdecomposition ofX in (3) can be rewritten as

minU(1) U(2) U(119879)

100381710038171003817100381710038171003817X(119905) minus U(119905) (⊙(119905)U)⊤1003817100381710038171003817100381710038172119865 (28)

X(119905) is thematricization ofX along the 119905thmode andM is thesparse representation ofX LetU(119905) = P and let (⊙(119905)U)⊤ = CThat is U(119905) is the cluster indication matrix for the 119905th typeof objects and (⊙(119905)U)⊤ is the cluster centres So the CPdecomposition in (3) is equivalent to the K-means clusteringfor the 119905th type of objects in heterogeneous informationnetwork 119866 = (119881 119864)

By matricization of X in (3) along different modes wecan prove that the CP decomposition is equivalent to theK-means clustering for other types of objects So the CPdecomposition of X obtains the clustering of multitypedobjects in the heterogeneous information network119866 = (119881 119864)simultaneously

It is worth noting that the CP decomposition basedclustering is a soft clustering method the factor matricesindicate the probability of objects belonging to correspondingclusters The soft clustering is more in line with reality

because many objects in the heterogeneous informationnetworks may belong to several clusters In other wordsthe clusters are overlapping In some cases the overlappingclusters need to be translated into nonoverlapping clusterswhich can be achieved by using different approaches such asK-means to cluster the rows of factor matrices Usually thenonoverlapping clusters can be obtained by simply assigningeach object to the cluster which has the largest entry in thecorresponding row of factor matrix

45 Time Complexity Analysis Themain time consumptionof updating each factor matrix U(119905) in SGDClus is com-puting 120597L120597U(119905) According to (17) we need to calculateX(119905)(⊙(119905)U) and U(119905)Γ(119905) respectively Since U(119905) isin R119873119905times119870

and X(119905) isin R119873119905timesprod

119879

119894=1119894 =119905

119873119894

we have X(119905)(⊙(119905)U) isin R119873119905times119870Γ(119905) isin R119870times119870 and U(119905)Γ(119905)isinR119873119905times119870 Firstly if we successively calculate the Khatri-Rao prod-

uct of 119879 minus 1 matrices and a matrix-matrix multiplicationwhen computing X(119905)(⊙(119905)U) the intermediate results willbe of very large size and the computational cost will bevery expensive In practice we can reduce the complexityby ignoring the unnecessary calculation Let us observe theelement ofX(119905)(⊙(119905)U) that is

(X(119905) (⊙(119905)U))119899119905 119896

= sum119899119894119879

119894=1119894 =119905

(119909119899119905 prod119879119894=1119894 =119905

119899119894

119879prod119894=1119894 =119905

119906(119894)119899119894 119896) (29)

119909119899119905 prod119879119894=1119894 =119905

119899119894is an element in the matricization ofX which rep-

resents a corresponding gene-network in the heterogeneousinformation networkWhen 119909119899119905 prod119879119894=1

119894 =119905

119899119894= 0 we can ignore the

following Khatri-Rao product Hence only nonzero elementsinX need to be computedTherefore the time complexity forcomputingX(119905)(⊙(119905)U) is 119874(119899119899119911(X)119873119905119870)

Secondly the element of U(119905)Γ(119905) is given by

(U(119905)Γ(119905))119899119905 119896

= 119870sum119895=1

(119906(119905)119899119905 119895 119879prod119894=1119894 =119905

119873119894sum119899119894=1

119906(119894)119899119894 119895119906(119894)119899119894 119896) (30)

So the time complexity of computing U(119905)Γ(119905) is 119874((119873 minus119873119905)1198702) where 119873 = sum119879119905=1119873119905 is the total number of objectsin the networks

Above all the time complexity of each iteration inSGDClus is 119874(119899119899119911(X)119873119870 + (119879 minus 1)1198731198702) Note that in thereal-world heterogeneous information networks the numberof clusters119870 and the number of object types 119879 are usually farless than119873 that is119870 ≪ 119873 and 119879 ≪ 119873

According to (22) the time complexity of updatingeach factor matrix U(119905) in SOSClus is composed of threecomponents X(119905)(⊙(119905)U) (Γ(119905) + 120582I)minus1 and the product ofthem Compared to SGDClus only the time consumption ofcomputing an inverse matrix for (Γ(119905) + 120582I) is additional in

8 Scientific Programming

SOSClus Since (Γ(119905)+120582I) is a119870times119870 squarematrix computingthe inverse of such matrix costs 119874(1198703) Nevertheless 119874(1198703)is usually negligible because119870 ≪ 119873

5 Experiments and Results

In this section we present several experiments on syntheticand real-world datasets for heterogeneous information net-works and compare the performance with a number of state-of-the-art clustering methods

51 Evaluation Metrics and Experimental Setting

511 Evaluation Metrics In the experiments we adopt theNormalized Mutual Information (NMI) [48] and Accuracy(AC) as our performance measurements

NMI is used to measure the mutual dependence infor-mation between the clustering result and the ground truthGiven 119873 objects 119870 clusters one clustering result andthe ground truth classes for the objects let 119899(119894 119895) 119894 119895 =1 2 119870 be the number of objects that labeled 119894 in clus-tering result but labeled 119895 in the ground truth The jointdistribution can be defined as119901(119894 119895) = 119899(119894 119895)119873 themarginaldistribution of rows can be calculated as 1199011(119895) = sum119870119894=1 119901(119894 119895)and themarginal distribution of columns can be calculated as1199012(119894) = sum119870119895=1 119901(119894 119895) Then the NMI is defined as

NMI = sum119870119894=1sum119870119895=1 119901 (119894 119895) log (119901 (119894 119895) 1199011 (119895) 1199012 (119894))radicsum119870119895=1 1199011 (119895) log1199011 (119895)sum119870119894=1 1199012 (119894) log1199012 (119894) (31)

The NMI ranges from 0 to 1 the larger value of NMI thebetter the clustering result

AC is used to compute the clustering accuracy thatmeasures the percent of the correct clustering result AC isdefined as

AC = sum119879119905=1sum119873119905119899=1 120575 (map (120592119905119899) label (120592119905119899))sum119879119905=1119873119905 (32)

where map(120592119905119899) is the cluster label of the object 120592119905119899 and thelabel(120592119905119899) is the ground truth class of the object 120592119905119899 And 120575(sdot)is an indicator function

120575 (sdot) = 1 if map (120592119905119899) = label (120592119905119899) 0 if map (120592119905119899) = label (120592119905119899) (33)

Since both ofNMI andACare used tomeasure the perfor-mance of clustering one type of object the weighted averageNMI and AC are also used to measure the performance ofSTFClus and other state-of-the-art methods

NMI = sum119879119905=1119873119905 (NMI)119905sum119879119905=1119873119905 AC = sum119879119905=1119873119905 (AC)119905sum119879119905=1119873119905 (34)

Table 1 The synthetic datasets

Syntheticdatasets 119879 119870 119878 119863Syn1 2 2 1119872 = 1000 times 1000 01Syn2 2 4 10119872 = 1000 times 10000 001Syn3 4 2 100119872 = 100 times 100 times 100 times 100 01Syn4 4 4 1000119872 = 100 times 100 times 100 times 1000 001119879 is the number of object types in the heterogeneous information networkand also the number of modes in the tensor119870 is the number of clusters 119878 isthe network scale and 119878 = 1198731 times1198732 times sdot sdot sdot times119873119879119863 is the density of the tensorthat is the percentage of nonzero elements in the tensor and119863 = 119899119899119911(X)119878

512 Experimental Setting In order to compare the perfor-mance of our proposed SGDClus and SOSClus with othersimpartially all methods share a common stopping conditionthat is 1003816100381610038161003816Liter minusLiterminus1

1003816100381610038161003816Literminus1

le 10minus6 (35)

or iter reaches the maximum iterations Liter and Literminus1are the values of function L at the current that is (iter)thiteration and the previous that is (iter minus 1)th iterationrespectively And we set the maximum iterations to be 1000Throughout the experiments the regularization parameter 120582in SGDClus and SOSClus is fixed as 120582 = 0001

All experiments are implemented in theMATLABR2015a(version 850) 64-bit And the MATLAB Tensor Toolbox(version 26 httpwwwsandiagovsimtgkoldaTensorTool-box) is used in our experiments Since the heterogeneousinformation networks are often sparse in real-world scenar-ios that is most elements in the tensor X are zeros we usethe sparse format of X as proposed in [47] which has beensupported by MATLAB Tensor Toolbox The experimentalresults are the average values obtained by running thealgorithms ten times on corresponding datasets

52 Experiments on Synthetic Datasets

521 The Synthetic Datasets Description The purposeof using synthetic datasets is to examine whether theproposed tensor CP decomposition clustering frameworkcan work well since the detailed cluster structures of thesynthetic datasets are known In order to make the syntheticdatasets similar to a realistic situation we assume thatthe distribution for different types of objects that appearin a gene-network follows Zipf rsquos law (see details onlinehttpsenwikipediaorgwikiZipf rsquos_law) Zipf rsquos law isdefined by119891(119903 120588119873) = 119903minus120588sum119873119899=1 119899minus120588 where119873 is the numberof objects 119903 is the object index and 120588 is the parameter charac-terizing the distribution Zipf rsquos law denotes the frequency ofthe 119903th object appearing in the gene-networkWe set 120588 = 095and generate 4 synthetic datasets with different parametersThe details of these synthetic datasets are shown in Table 1

522 Experimental Results In the beginning of the experi-ments we set a common learning rate 120578 = 1(iter + 1) for

Scientific Programming 9

SGDClus on synthetic datasets

0 10 20 30 40 50

Iteration

Erro

r

Syn1Syn2

Syn3Syn4

000

005

010

015

020

025

030

035

SOSClus on synthetic datasets

0 10 20 30 40 50

Iteration

Syn1Syn2

Syn3Syn4

00

01

02

03

04

05

06

07

08

Erro

r

Figure 3 Convergence speed of SGDClus and SOSClus on the 4 synthetic datasets with a common learning rate that is 120578 = 1(iter + 1)SGDClus and SOSClus We find that SOSClus has a fasterconvergence speed and better robustness with respect to thelearning rate which is clearly shown in Figure 3 AlthoughSGDClus may eventually converge to a local minimum theefficiency of the optimization near a local minimum is not allroses As shown in Figure 3 the solutions of SGDClus swingaround a local minimum This phenomenon proves that theconvergence speed of SGDClus is sensitive to the choice oflearning rate 120578

Then we modify the learning rate to 120578 = 1(iter +119888) for SGDClus where 119888 is a constant optimized in theexperiments In practice 119888 = 27855 for SGDClus runningon Syn3 and 119888 = 430245 for SGDClus running on Syn4The performance comparison of SGDClus with learning rate120578 = 1(iter + 119888) and SOSClus with learning rate 120578 = 1(iter +1) on Syn3 and Syn4 is shown in Figure 4 By employingthe optimized learning rate SGDClus converges to a localminimumquicklyHowever compared to SOSClus SGDClusstill has no advantageThe hand drawing blue circles over thecurve of SOSClus in Figure 4 shows that SOSClus can escapefrom a local minimum and find the global minimum whileSGDClus just obtains the first reaching local minimum

According to (23) and (24) we accessorily obtain the solu-tions of OPT by running SOSClus So we compare the ACand NMI of OPT SGDClus and SOSClus on the 4 syntheticdatasets which are shown in Figure 5 With the increase ofobject types in the heterogeneous information networks theAC and NMI of SOSClus and OPT increase distinctly whileperformance of SGDClus almost does not changeWhen 119879 =2 AC and NMI of these three methods on Syn1 and Syn2are almost equal and low However AC and NMI of SOSClusincrease to 1 when 119879 = 4 Since the histograms of OPT SGD-Clus and SOSClus on Syn1 and Syn2 are almost the same andon Syn3 and Syn4 are also similar we know that the param-eters 119870 and 119878 have no significant effect on the performanceGenerally the larger density119863 and the number of object types119879 in the network result in higher AC and NMI of SOSClus

Obviously in the experiments on the 4 synthetic datasetsSOSClus shows an excellent performance SOSClus has afaster convergence speed and better robustness with respectto the learning rate Meanwhile SOSClus performs better onAC and NMI because it can escape from a local minimumand find the global minimum

53 Performance Comparison on Real-World Dataset

531 Real-World Dataset Description The experiments onthe real-world dataset are used to compare the performanceof the tensor CP decomposition clustering framework withother state-of-the-art methods

The real-world dataset extracted from the DBLP databaseis the DBLP-four-area dataset which can be downloadedfrom httpwebcsuclaedusimyzsundataDBLP_four_areazip It is a four research-area subset of DBLP and is usedin [2 3 12 13 15 16 18] The four research areas in DBLP-four-area dataset are database (DB) data mining (DM)machine learning (ML) and information retrieval (IR)respectively There are five representative conferences ineach area And all related authors papers published in theseconferences and terms contained in these papersrsquo titlesare included The DBLP-four-area dataset contains 14376papers with 100 labeled 14475 authors with 4057 labeled20 labeled conferences and 8920 terms The density of theDBLP-four-area dataset is 901935 times 10minus9 so we construct a4-mode tensor with size of 14376 times 14475 times 20 times 8920 and334832 nonzero elements We compare the performance oftensor CP decomposition clustering framework with severalother methods on the labeled record in this dataset

532 Comparative Methods

(i) NetClus (see [2]) An extended version of RankClus [1]which can deal with the network follows the star networkschema The time complexity of NetClus for clustering each

10 Scientific Programming

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

08

Erro

rExperiments on Syn3

SGDClusSOSClus

Escaped from a local minimum

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

Erro

r

Experiments on Syn4

SGDClusSOSClus

Escaped from a local minimum

Figure 4 Convergence speed of SGDClus and SOSClus on Syn3 and Syn4 with different learning rate The learning rate for SOSClus is120578 = 1(iter + 1) while that for SGDClus is 120578 = 1(iter + 119888) where 119888 is a constant optimized in the experiments

OPTSGDClusSOSClus

Synthetic datasets

00

02

04

06

08

10

AC

Syn1 Syn2 Syn4Syn3

OPTSGDClusSOSClus

NM

I

00

02

04

06

08

10

Synthetic datasetsSyn1 Syn2 Syn4Syn3

Figure 5 The AC and NMI of OPT SGDClus and SOSClus on the 4 synthetic datasets

object type in each iteration is 119874(119870|119864| + (1198702 + 119870)119873) where119870 is the number of clusters |119864| is the number of edges in thenetwork and119873 is the total number of objects in the network

(ii) PathSelClus (see [15 16]) A clustering method based onthe predefined metapath requires a user guide In PathSel-Clus the distance between the same type objects is measuredby PathSim [3] and themethod starts with the given seeds byuser The time complexity of PathSelClus for clustering eachobject type in each iteration is 119874((119870 + 1)|P| + 119870119873) where|P| is the number of metapath instances in the networkAnd the time complexity of PathSim used by PathSelClus forclustering each object type is 119874(119873119889) where 119889 is the averagedegree of objects

(iii) FctClus (see [13]) It is a recently proposed clusteringmethod for heterogeneous information networks As withNetClus the FctClus method can deal with networks follow-ing the star network schema The time complexity of FctClusfor clustering each object type in each iteration is 119874(119870|119864| +119879119870119873)533 Experimental Results As the baseline methods canonly deal with specific schema heterogeneous informationnetworks here we must construct different subnetworks forthem For NetClus and FctClus the network is organized as astar network schema like in [2 13] where the paper (P) is thecentre type and author (A) conference (C) and term (T) are

Scientific Programming 11

Table 2 AC of experiments on DBLP-four-area dataset

AC OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 05882 08476 09007 07154 07551 07887Author 05872 08486 09486 07177 07951 08008Conference 1 099 1 09172 09950 09031avg (AC) 05892 08493 09477 07186 07951 08010

Table 3 NMI of experiments on DBLP-four-area dataset

NMI OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 06557 06720 08812 05402 06142 07152Author 06539 08872 08822 05488 06770 06012Conference 1 08497 1 08858 09906 08248avg (NMI) 06556 08778 08827 05503 06770 06050

Table 4 Running time of experiments on DBLP-four-area dataset

Running time (s) OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper mdash mdash mdash 8026 5423 8084Author mdash mdash mdash 7437 6811 7749Conference mdash mdash mdash 6584 6293 6698Total time 6726 4324 8184 22047 18527 22531

the attribute types For PathSelClus we select the metapathof P-T-P A-P-C-P-A and C-P-T-P-C to cluster the papersauthors and conferences respectively And in PathSelCluswe give each cluster one seed to start

Wemodel theDBLP-four-area dataset as a 4-mode tensorwhere each mode represents one object type The 4 modesare author (A) paper (P) conference (C) and term (T)respectively Actually the sequence of the object types isinsignificant And each element of the tensor represents agene-network in the heterogeneous information network Inthe experiments we set the learning rate for SOSClus to be120578 = 1(iter+1) and an optimized learning rate 120578 = 1(iter+119888)with 119888 = 1000125 for SGDClus By running SOSClus weaccessorily obtain the solutions of OPT So we compare theexperimental results of OPT SGDClus and SOSClus on theDBLP-four-area dataset with the three baseline methods Seethe details in Tables 2 3 and 4

In Tables 2 and 3 SOSClus performs best on average ACand NMI and SGDClus takes the second place All methodsachieve satisfactory AC and NMI on the conference sincethere are only 20 conferences in the network SGDClus takesthe shortest running time and OPT and SOSClus have anobvious advantage on running time compared with otherbaselinesThe time complexity of SGDClus is119874(119899119899119911(X)119873119870+(119879 minus 1)1198702119873) and the time complexity of SOSClus is119874(119899119899119911(X)119873119870+(119879minus1)1198702119873+1198703) where 119899119899119911(X) is the num-ber of nonzero elements in X that is the number of gene-networks in the heterogeneous information network Wehave 119899119899z(X) lt |P| ≪ |119864| 119870 ≪ 119873 and 119879 ≪ 119873 Comparedwith the time complexity of the three baselines SGDClus andSOSClus have a little disadvantage However it is worth not-ing that the three baselines can only cluster one type of objectsin the network in each running while OPT SGDClus and

SOSClus can obtain the clusters of all types of objects simulta-neously by running onceThis is the reasonwhy only the totaltime is shown for OPT SGDClus and SOSClus in Table 4Moreover the total time of OPT SGDClus and SOSClusis on the same order of magnitude as the running time ofother baselines for clustering each type of objects which isconsistent with the comparison of the time complexity

6 Conclusion

In this paper a tensor CP decomposition method for clus-tering heterogeneous information networks is presented Intensor CP decomposition clustering framework each type ofobjects in heterogeneous information network is modeled asone mode of tensor and the gene-networks in the networkare modeled as the elements in tensor In other wordstensor CP decomposition clustering framework can modeldifferent types of objects and semantic relations in the het-erogeneous information network without the restriction ofnetwork schema In addition two stochastic gradient descentalgorithms named SGDClus and SOSClus are designedSGDClus and SOSClus can cluster all types of objects andthe gene-networks simultaneously by running once Theproposed algorithms outperformed other state-of-the-artclustering methods in terms of AC NMI and running time

Notations119886 A scalar (lowercase letter)a A vector (boldface lowercase letter)A A matrix (boldface capital letter)X A tensor (calligraphic letter)

12 Scientific Programming

1199091198941 1198942119894119873 The (1198941 1198942 119894119873)th element of an 119873th-order tensorX

X(119899) Matricization ofX along the 119899th modelowast Hadamard product (element-wise prod-uct) of two tensors (or matrices or vectors)with the same dimension⊙ Khatri-Rao product of two matrices sdot 119865 Frobenius norm of a tensor (or matrices orvectors)

a ∘ b Outer product of two vectors

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This study was supported by the National Natural ScienceFoundation of China (no 61401482 and no 61401483)

References

[1] Y Sunt J Hant P Zhao Z Yin H Cheng and T WuldquoRankClus integrating clustering with ranking for heteroge-neous information network analysisrdquo in Proceedings of the 12thInternational Conference on Extending Database TechnologyAdvances inDatabase Technology (EDBT rsquo09) pp 565ndash576 SaintPetersburg Russia March 2009

[2] Y Sun Y Yu and J Han ldquoRanking-based clustering of hetero-geneous information networks with star network schemardquo inProceedings of the 15th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo09) pp 797ndash805 Paris France July 2009

[3] Y Sun J Han X Yan P S Yu and TWu ldquoPathSim meta path-based top-K similarity search in heterogeneous informationnetworksrdquo PVLDB vol 4 no 11 pp 992ndash1003 2011

[4] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability Volume1 Statistics pp 281ndash297 University of California Press BerkeleyCalif USA 1967

[5] S Guha R Rastogi and K Shim ldquoCure an efficient clusteringalgorithm for large databasesrdquo Information Systems vol 26 no1 pp 35ndash58 2001

[6] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the 2nd International Conference onKnowledge Discovery and Data Mining (KDD-96) E SimoudisJ Han and U M Fayyad Eds pp 226ndash231 AAAI PressPortland Ore USA 1996

[7] W Wang J Yang and R R Muntz ldquoSTING a statisticalinformation grid approach to spatial data miningrdquo in Pro-ceedings of the 23rd International Conference on Very LargeData Bases (VLDB rsquo97) M Jarke M J Carey K R DittrichF H Lochovsky P Loucopoulos and M A Jeusfeld Edspp 186ndash195 Morgan Kaufmann Athens Greece August 1997httpwwwvldborgconf1997P186PDF

[8] E H Ruspini ldquoNew experimental results in fuzzy clusteringrdquoInformation Sciences vol 6 no 73 pp 273ndash284 1973

[9] U von Luxburg ldquoA tutorial on spectral clusteringrdquo Statistics andComputing vol 17 no 4 pp 395ndash416 2007

[10] C Shi Y Li J Zhang Y Sun and P S Yu ldquoA survey of hetero-geneous information network analysisrdquo IEEE Transactions onKnowledge and Data Engineering vol 29 no 1 pp 17ndash37 2017

[11] J Han Y Sun X Yan and P S Yu ldquoMining knowledge fromdata an information network analysis approachrdquo in Proceedingsof the IEEE 28th International Conference on Data Engineering(ICDE rsquo12) pp 1214ndash1217 IEEE Washington DC USA April2012

[12] J Chen W Dai Y Sun and J Dy ldquoClustering and rankingin heterogeneous information networks via gamma-poissonmodelrdquo in Proceedings of the SIAM International Conference onData Mining (SDM rsquo15) pp 424ndash432 May 2015

[13] J Yang L Chen and J Zhang ldquoFctClus a fast clustering algo-rithm for heterogeneous information networksrdquoPLoSONE vol10 no 6 Article ID e0130086 2015

[14] C Shi R Wang Y Li P S Yu and B Wu ldquoRanking-basedclustering on general heterogeneous information networks bynetwork projectionrdquo in Proceedings of the 23rd ACM Interna-tional Conference on Conference on Information and KnowledgeManagement (CIKM rsquo14) pp 699ndash708 ACM Shanghai ChinaNovember 2014

[15] Y Sun B Norick J Han X Yan P S Yu and X Yu ldquoInte-grating meta-path selection with user-guided object clusteringin heterogeneous information networksrdquo in Proceedings of the18th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo12) pp 1348ndash1356 ACMBeijing China August 2012

[16] Y Sun B Norick J Han X Yan P S Yu and X YuldquoPathSelClus integrating meta-path selection with user-guidedObject clustering in heterogeneous information networksrdquoACMTransactions onKnowledgeDiscovery fromData vol 7 no3 pp 723ndash724 2013

[17] X Yu Y Sun B Norick T Mao and J Han ldquoUser guided entitysimilarity search using meta-path selection in heterogeneousinformation networksrdquo in Proceedings of the 21st ACM Interna-tional Conference on Information and Knowledge Management(CIKM rsquo12) pp 2025ndash2029 ACM November 2012

[18] Y Sun C C Aggarwal and J Han ldquoRelation strength-awareclustering of heterogeneous information networks with incom-plete attributesrdquoProceedings of the VLDBEndowment vol 5 no5 pp 394ndash405 2012

[19] M Zhang H Hu Z He andWWang ldquoTop-k similarity searchin heterogeneous information networks with x-star networkschemardquo Expert Systems with Applications vol 42 no 2 pp699ndash712 2015

[20] Y Zhou and L Liu ldquoSocial influence based clustering ofheterogeneous information networksrdquo in Proceedings of the19th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 338ndash346 ACM Chicago IllUSA August 2013

[21] L R Tucker ldquoSome mathematical notes on three-mode factoranalysisrdquo Psychometrika vol 31 pp 279ndash311 1966

[22] R A Harshman ldquoFoundations of the parafac procedure modeland conditions for an lsquoexplanatoryrsquo multi-mode factor analysisrdquoUCLAWorking Papers in Phonetics 1969

[23] H A L Kiers ldquoTowards a standardized notation and terminol-ogy in multiway analysisrdquo Journal of Chemometrics vol 14 no3 pp 105ndash122 2000

[24] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

Scientific Programming 13

[25] A Cichocki D Mandic L De Lathauwer et al ldquoTensordecompositions for signal processing applications from two-way to multiway component analysisrdquo IEEE Signal ProcessingMagazine vol 32 no 2 pp 145ndash163 2015

[26] W Peng and T Li ldquoTensor clustering via adaptive subspaceiterationrdquo Intelligent Data Analysis vol 15 no 5 pp 695ndash7132011

[27] J Hastad ldquoTensor rank is NP-completerdquo Journal of AlgorithmsCognition Informatics and Logic vol 11 no 4 pp 644ndash6541990

[28] S Metzler and P Miettinen ldquoClustering Boolean tensorsrdquo DataMining amp Knowledge Discovery vol 29 no 5 pp 1343ndash13732015

[29] X Cao X Wei Y Han and D Lin ldquoRobust face clustering viatensor decompositionrdquo IEEE Transactions on Cybernetics vol45 no 11 pp 2546ndash2557 2015

[30] I Sutskever R Salakhutdinov and J B Tenenbaum ldquoModellingrelational data using Bayesian clustered tensor factorizationrdquoin Proceedings of the 23rd Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo09) pp 1821ndash1828 BritishColumbia Canada December 2009

[31] B Ermis E Acar and A T Cemgil ldquoLink prediction in het-erogeneous data via generalized coupled tensor factorizationrdquoDataMining amp Knowledge Discovery vol 29 no 1 pp 203ndash2362015

[32] A R Benson D F Gleiche and J Leskovec ldquoTensor spectralclustering for partitioning higher-order network structuresrdquoin Proceedings of the SIAM International Conference on DataMining (SDM rsquo15) pp 118ndash126 Vancouver Canada May 2015

[33] L Xiong X Chen T-K Huang J G Schneider and JG Carbonell ldquoTemporal collaborative filtering with Bayesianprobabilistic tensor factorizationrdquo in Proceedings of the 10thSIAM International Conference on Data Mining (SDM rsquo10) pp211ndash222 Columbus Ohio USA May 2010

[34] E E Papalexakis L Akoglu and D Ience ldquoDo more views ofa graph help Community detection and clustering in multi-graphsrdquo in Proceedings of the 16th International Conferenceof Information Fusion (FUSION rsquo13) pp 899ndash905 IstanbulTurkey July 2013

[35] M Vandecappelle M Bousse F V Eeghem and L D Lath-auwer ldquoTensor decompositions for graph clusteringrdquo Inter-nal Report 16-170 ESAT-STADIUS KU Leuven LeuvenBelgium 2016 ftpftpesatkuleuvenbepubSISTAsistakul-akreports2016_Tensor_Graph_Clusteringpdf

[36] W Shao L He and P S Yu ldquoClustering on multi-sourceincomplete data via tensor modeling and factorizationrdquo inProceedings of the 19th Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD rsquo15) pp 485ndash497 Ho ChiMinh City Vietnam 2015

[37] J D Carroll and J-J Chang ldquoAnalysis of individual differencesin multidimensional scaling via an n-way generalization ofldquoEckart-Youngrdquo decompositionrdquo Psychometrika vol 35 no 3pp 283ndash319 1970

[38] L De Lathauwer B De Moor and J Vandewalle ldquoOn thebest rank-1 and rank-(R1 R2 RN) approximation ofhigher-order tensorsrdquo SIAM Journal on Matrix Analysis andApplications vol 21 no 4 pp 1324ndash1342 2000

[39] E Acar D M Dunlavy and T G Kolda ldquoA scalable opti-mization approach for fitting canonical tensor decompositionsrdquoJournal of Chemometrics vol 25 no 2 pp 67ndash86 2011

[40] S Hansen T Plantenga and T G Kolda ldquoNewton-basedoptimization for Kullback-Leibler nonnegative tensor factor-izationsrdquo Optimization Methods amp Software vol 30 no 5 pp1002ndash1029 2015

[41] NVervliet and LDe Lathauwer ldquoA randomized block samplingapproach to canonical polyadic decomposition of large-scaletensorsrdquo IEEE Journal on SelectedTopics in Signal Processing vol10 no 2 pp 284ndash295 2016

[42] R Ge F Huang C Jin and Y Yuan ldquoEscaping from saddlepointsmdashonline stochastic gradient for tensor decompositionrdquohttparxivorgabs150302101

[43] T G Kolda ldquoMultilinear operators for higher-order decompo-sitionsrdquo Tech Rep SAND2006-2081 Sandia National Labora-tories 2006 httpwwwostigovscitechbiblio923081=0pt

[44] P Paatero ldquoA weighted non-negative least squares algorithmfor three-way lsquoPARAFACrsquo factor analysisrdquo Chemometrics ampIntelligent Laboratory Systems vol 38 no 2 pp 223ndash242 1997

[45] P Paatero ldquoConstruction and analysis of degenerate PARAFACmodelsrdquo Journal of Chemometrics vol 14 no 3 pp 285ndash2992000

[46] B L Bottou and N Murata ldquoStochastic approximations andefficient learningrdquo inTheHandbook of BrainTheory and NeuralNetworks 2nd edition 2002

[47] B W Bader and T G Kolda ldquoEfficient MATLAB computationswith sparse and factored tensorsrdquo SIAM Journal on ScientificComputing vol 30 no 1 pp 205ndash231 2007

[48] A Strehl and J Ghosh ldquoCluster ensemblesmdasha knowledgereuse framework for combining multiple partitionsrdquo Journal ofMachine Learning Research vol 3 no 3 pp 583ndash617 2003

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: A Tensor CP Decomposition Method for Clustering ...downloads.hindawi.com/journals/sp/2017/2803091.pdf · spectral clustering, are designed to analyze homogeneous informationnetworks

6 Scientific Programming

Input X 119870 120582Output U(119905)119879119905=1(1) Initialize U(119905)119879119905=1(2) Set iter larr 1(3) repeat(4) for 119905 larr 1 to119879 do(5) Set 120578(6) Update U(119905) according to (18)(7) Normalize U(119905) according to (19)(8) end for(9) Set iter larr iter + 1(10) until L(XU(1)U(2) U(119879)) unchanged or iter

reaches the maximum iterations

Algorithm 1 The SGDClus algorithm

Therefore the partial derivative ofL is given by

120597L120597U(119905) = minusX(119905) (⊙(119905)U) + U(119905)Γ(119905) + 120582U(119905) (17)

And (12) can be rewritten as

U(119905) larr997888 U(119905) minus 120578 120597L120597U(119905)= U(119905) (I minus 120578 (Γ(119905) + 120582I)) + 120578X(119905) (⊙(119905)U) (18)

where I is an identity matrix Note that U(119905)119879119905=1 derived by(18) do not satisfy the first and second constraints in (9) Tosatisfy these two constraints we can normalize each row ofU(119905)119879119905=1 by

119906(119905)119899119896 larr997888 119906(119905)119899119896sum119870119896=1 119906(119905)119899119896 (19)

Furthermore the pseudocode of SGDClus is given inAlgorithm 1

432 SOSClus In SOSClus we apply the second-orderstochastic algorithm to the clustering problem in (9) Accord-ing to (11) each factor matrix U(119905) for 119905 = 1 2 119879 isupdated by the rule

U(119905) larr997888 U(119905) minus 120578( 1205972L1205972U(119905))minus1 120597L120597U(119905) (20)

According to (17) we can obtain the second-order partialderivative ofL with respect to U(119905) that is

1205972L1205972U(119905) = Γ(119905) + 120582I (21)

By substituting (17) and (21) into (20) we have

U(119905) larr997888 (1 minus 120578)U(119905) + 120578X(119905) (⊙(119905)U) (Γ(119905) + 120582I)minus1 (22)

Input X 119870 120582Output U(119905)119879119905=1(1) Initialize U(119905)119879119905=1(2) Set iter larr 1(3) repeat(4) for 119905 larr 1 to 119879 do(5) Set 120578(6) Compute U(119905)opt according to (23)(7) Update U(119905) according to (24)(8) Normalize U(119905) according to (19)(9) end for(10) Set iter larr iter + 1(11) until L(XU(1)U(2) U(119879)) unchanged or iter

reaches the maximum iterations

Algorithm 2 The SOSClus algorithm

Note thatX(119905)(⊙(119905)U)(Γ(119905)+120582I)minus1 is theGeneralGradient-based Optimization (OPT) [39] solution for updating U(119905) inthe regularized CP decomposition by making (17) equal tozero See the details of proof in [39] Let

U(119905)opt = X(119905) (⊙(119905)U) (Γ(119905) + 120582I)minus1 (23)

Therefore the updating rule of U(119905) in SOSClus is a weightedaverage of the current solution and the OPT solution intu-itively that is

U(119905) larr997888 (1 minus 120578)U(119905) + 120578U(119905)opt (24)

Actually SOSClus is a general extension of GeneralGradient-based Optimization (OPT) [39] and the ALS withstep size restriction in randomized block sampling method[41] In (24) when the learning rate 120578 = 1 we get the OPTsolution When the regularization parameter 120582 = 0 SOSClusbecomes the ALS with step size restriction in randomizedblock sampling method

Similar to SGDClus U(119905)119879119905=1 derived by (24) in SOSClusalso do not satisfy the first and second constraints in (9) Weshould normalize each row of U(119905)119879119905=1 according to (19) Thepseudocode of SOSClus is given in Algorithm 2

44 Feasibility Analysis

Theorem 4 TheCP decomposition ofX obtains the clusteringofmultityped objects in the heterogeneous information network119866 = (119881 119864) simultaneously

Proof Since the proofs for different types of objects in theheterogeneous information network 119866 = (119881 119864) are similarwewill simply describe the process for a single type of objectsWithout loss of generality we detail the proof on the 119905th typeof objects

Given a heterogeneous information network 119866 = (119881 119864)and its tensor representation X the nonzero elements inX represent the input gene-networks in 119866 = (119881 119864) which

Scientific Programming 7

we want to partition into 119870 clusters C1C2 C119870 Thecentre of the cluster C119896 is denoted by c119896 By using thecoordinate format [47] as the sparse representation ofX thegene-networks can be denoted as a matrix M isin R119899119899119911(X)times119879where 119899119899119911(X) is the number of nonzero elements inX Eachrow m119899 isin M 119899 = 1 2 119899119899119911(X) gives the subscripts ofcorresponding nonzero element in X In other words m119899

represents a gene-network and the entries 119898119899119905 isin m119899 119905 =1 2 119879 are the subscripts of the objects contained in thegene-network

The traditional clustering approach such as K-meansminimizes the sum of differences between individual gene-network in each cluster and the corresponding cluster centresthat is

min119901119899119896c119896

119899119899119911(X)sum119899=1

1003817100381710038171003817100381710038171003817100381710038171003817m119899 minus 119870sum119896=1

119901119899119896c11989610038171003817100381710038171003817100381710038171003817100381710038172

119865

(25)

where 119901119899119896 is the probability of gene-network m119899 belongingto the 119896th cluster Also we can rewrite the problem by anew perspective of clustering individual object in the gene-network as follows

min119901119899119905119896119888119896119905

119899119899119911(X)sum119899=1

119879sum119905=1

1003817100381710038171003817100381710038171003817100381710038171003817119898119899119905 minus119870sum119896=1

119901119899119905 11989611988811989611990510038171003817100381710038171003817100381710038171003817100381710038172

119865

(26)

where 119901119899119905 119896 is the probability of object V119905119899 belonging to the 119896thcluster

In the matrix form K-means can be formalized as

minPC

M minus PC2119865 (27)

where P is the cluster indication matrix and C is the clustercentres

By matricization of X along the 119905th mode the CPdecomposition ofX in (3) can be rewritten as

minU(1) U(2) U(119879)

100381710038171003817100381710038171003817X(119905) minus U(119905) (⊙(119905)U)⊤1003817100381710038171003817100381710038172119865 (28)

X(119905) is thematricization ofX along the 119905thmode andM is thesparse representation ofX LetU(119905) = P and let (⊙(119905)U)⊤ = CThat is U(119905) is the cluster indication matrix for the 119905th typeof objects and (⊙(119905)U)⊤ is the cluster centres So the CPdecomposition in (3) is equivalent to the K-means clusteringfor the 119905th type of objects in heterogeneous informationnetwork 119866 = (119881 119864)

By matricization of X in (3) along different modes wecan prove that the CP decomposition is equivalent to theK-means clustering for other types of objects So the CPdecomposition of X obtains the clustering of multitypedobjects in the heterogeneous information network119866 = (119881 119864)simultaneously

It is worth noting that the CP decomposition basedclustering is a soft clustering method the factor matricesindicate the probability of objects belonging to correspondingclusters The soft clustering is more in line with reality

because many objects in the heterogeneous informationnetworks may belong to several clusters In other wordsthe clusters are overlapping In some cases the overlappingclusters need to be translated into nonoverlapping clusterswhich can be achieved by using different approaches such asK-means to cluster the rows of factor matrices Usually thenonoverlapping clusters can be obtained by simply assigningeach object to the cluster which has the largest entry in thecorresponding row of factor matrix

45 Time Complexity Analysis Themain time consumptionof updating each factor matrix U(119905) in SGDClus is com-puting 120597L120597U(119905) According to (17) we need to calculateX(119905)(⊙(119905)U) and U(119905)Γ(119905) respectively Since U(119905) isin R119873119905times119870

and X(119905) isin R119873119905timesprod

119879

119894=1119894 =119905

119873119894

we have X(119905)(⊙(119905)U) isin R119873119905times119870Γ(119905) isin R119870times119870 and U(119905)Γ(119905)isinR119873119905times119870 Firstly if we successively calculate the Khatri-Rao prod-

uct of 119879 minus 1 matrices and a matrix-matrix multiplicationwhen computing X(119905)(⊙(119905)U) the intermediate results willbe of very large size and the computational cost will bevery expensive In practice we can reduce the complexityby ignoring the unnecessary calculation Let us observe theelement ofX(119905)(⊙(119905)U) that is

(X(119905) (⊙(119905)U))119899119905 119896

= sum119899119894119879

119894=1119894 =119905

(119909119899119905 prod119879119894=1119894 =119905

119899119894

119879prod119894=1119894 =119905

119906(119894)119899119894 119896) (29)

119909119899119905 prod119879119894=1119894 =119905

119899119894is an element in the matricization ofX which rep-

resents a corresponding gene-network in the heterogeneousinformation networkWhen 119909119899119905 prod119879119894=1

119894 =119905

119899119894= 0 we can ignore the

following Khatri-Rao product Hence only nonzero elementsinX need to be computedTherefore the time complexity forcomputingX(119905)(⊙(119905)U) is 119874(119899119899119911(X)119873119905119870)

Secondly the element of U(119905)Γ(119905) is given by

(U(119905)Γ(119905))119899119905 119896

= 119870sum119895=1

(119906(119905)119899119905 119895 119879prod119894=1119894 =119905

119873119894sum119899119894=1

119906(119894)119899119894 119895119906(119894)119899119894 119896) (30)

So the time complexity of computing U(119905)Γ(119905) is 119874((119873 minus119873119905)1198702) where 119873 = sum119879119905=1119873119905 is the total number of objectsin the networks

Above all the time complexity of each iteration inSGDClus is 119874(119899119899119911(X)119873119870 + (119879 minus 1)1198731198702) Note that in thereal-world heterogeneous information networks the numberof clusters119870 and the number of object types 119879 are usually farless than119873 that is119870 ≪ 119873 and 119879 ≪ 119873

According to (22) the time complexity of updatingeach factor matrix U(119905) in SOSClus is composed of threecomponents X(119905)(⊙(119905)U) (Γ(119905) + 120582I)minus1 and the product ofthem Compared to SGDClus only the time consumption ofcomputing an inverse matrix for (Γ(119905) + 120582I) is additional in

8 Scientific Programming

SOSClus Since (Γ(119905)+120582I) is a119870times119870 squarematrix computingthe inverse of such matrix costs 119874(1198703) Nevertheless 119874(1198703)is usually negligible because119870 ≪ 119873

5 Experiments and Results

In this section we present several experiments on syntheticand real-world datasets for heterogeneous information net-works and compare the performance with a number of state-of-the-art clustering methods

51 Evaluation Metrics and Experimental Setting

511 Evaluation Metrics In the experiments we adopt theNormalized Mutual Information (NMI) [48] and Accuracy(AC) as our performance measurements

NMI is used to measure the mutual dependence infor-mation between the clustering result and the ground truthGiven 119873 objects 119870 clusters one clustering result andthe ground truth classes for the objects let 119899(119894 119895) 119894 119895 =1 2 119870 be the number of objects that labeled 119894 in clus-tering result but labeled 119895 in the ground truth The jointdistribution can be defined as119901(119894 119895) = 119899(119894 119895)119873 themarginaldistribution of rows can be calculated as 1199011(119895) = sum119870119894=1 119901(119894 119895)and themarginal distribution of columns can be calculated as1199012(119894) = sum119870119895=1 119901(119894 119895) Then the NMI is defined as

NMI = sum119870119894=1sum119870119895=1 119901 (119894 119895) log (119901 (119894 119895) 1199011 (119895) 1199012 (119894))radicsum119870119895=1 1199011 (119895) log1199011 (119895)sum119870119894=1 1199012 (119894) log1199012 (119894) (31)

The NMI ranges from 0 to 1 the larger value of NMI thebetter the clustering result

AC is used to compute the clustering accuracy thatmeasures the percent of the correct clustering result AC isdefined as

AC = sum119879119905=1sum119873119905119899=1 120575 (map (120592119905119899) label (120592119905119899))sum119879119905=1119873119905 (32)

where map(120592119905119899) is the cluster label of the object 120592119905119899 and thelabel(120592119905119899) is the ground truth class of the object 120592119905119899 And 120575(sdot)is an indicator function

120575 (sdot) = 1 if map (120592119905119899) = label (120592119905119899) 0 if map (120592119905119899) = label (120592119905119899) (33)

Since both ofNMI andACare used tomeasure the perfor-mance of clustering one type of object the weighted averageNMI and AC are also used to measure the performance ofSTFClus and other state-of-the-art methods

NMI = sum119879119905=1119873119905 (NMI)119905sum119879119905=1119873119905 AC = sum119879119905=1119873119905 (AC)119905sum119879119905=1119873119905 (34)

Table 1 The synthetic datasets

Syntheticdatasets 119879 119870 119878 119863Syn1 2 2 1119872 = 1000 times 1000 01Syn2 2 4 10119872 = 1000 times 10000 001Syn3 4 2 100119872 = 100 times 100 times 100 times 100 01Syn4 4 4 1000119872 = 100 times 100 times 100 times 1000 001119879 is the number of object types in the heterogeneous information networkand also the number of modes in the tensor119870 is the number of clusters 119878 isthe network scale and 119878 = 1198731 times1198732 times sdot sdot sdot times119873119879119863 is the density of the tensorthat is the percentage of nonzero elements in the tensor and119863 = 119899119899119911(X)119878

512 Experimental Setting In order to compare the perfor-mance of our proposed SGDClus and SOSClus with othersimpartially all methods share a common stopping conditionthat is 1003816100381610038161003816Liter minusLiterminus1

1003816100381610038161003816Literminus1

le 10minus6 (35)

or iter reaches the maximum iterations Liter and Literminus1are the values of function L at the current that is (iter)thiteration and the previous that is (iter minus 1)th iterationrespectively And we set the maximum iterations to be 1000Throughout the experiments the regularization parameter 120582in SGDClus and SOSClus is fixed as 120582 = 0001

All experiments are implemented in theMATLABR2015a(version 850) 64-bit And the MATLAB Tensor Toolbox(version 26 httpwwwsandiagovsimtgkoldaTensorTool-box) is used in our experiments Since the heterogeneousinformation networks are often sparse in real-world scenar-ios that is most elements in the tensor X are zeros we usethe sparse format of X as proposed in [47] which has beensupported by MATLAB Tensor Toolbox The experimentalresults are the average values obtained by running thealgorithms ten times on corresponding datasets

52 Experiments on Synthetic Datasets

521 The Synthetic Datasets Description The purposeof using synthetic datasets is to examine whether theproposed tensor CP decomposition clustering frameworkcan work well since the detailed cluster structures of thesynthetic datasets are known In order to make the syntheticdatasets similar to a realistic situation we assume thatthe distribution for different types of objects that appearin a gene-network follows Zipf rsquos law (see details onlinehttpsenwikipediaorgwikiZipf rsquos_law) Zipf rsquos law isdefined by119891(119903 120588119873) = 119903minus120588sum119873119899=1 119899minus120588 where119873 is the numberof objects 119903 is the object index and 120588 is the parameter charac-terizing the distribution Zipf rsquos law denotes the frequency ofthe 119903th object appearing in the gene-networkWe set 120588 = 095and generate 4 synthetic datasets with different parametersThe details of these synthetic datasets are shown in Table 1

522 Experimental Results In the beginning of the experi-ments we set a common learning rate 120578 = 1(iter + 1) for

Scientific Programming 9

SGDClus on synthetic datasets

0 10 20 30 40 50

Iteration

Erro

r

Syn1Syn2

Syn3Syn4

000

005

010

015

020

025

030

035

SOSClus on synthetic datasets

0 10 20 30 40 50

Iteration

Syn1Syn2

Syn3Syn4

00

01

02

03

04

05

06

07

08

Erro

r

Figure 3 Convergence speed of SGDClus and SOSClus on the 4 synthetic datasets with a common learning rate that is 120578 = 1(iter + 1)SGDClus and SOSClus We find that SOSClus has a fasterconvergence speed and better robustness with respect to thelearning rate which is clearly shown in Figure 3 AlthoughSGDClus may eventually converge to a local minimum theefficiency of the optimization near a local minimum is not allroses As shown in Figure 3 the solutions of SGDClus swingaround a local minimum This phenomenon proves that theconvergence speed of SGDClus is sensitive to the choice oflearning rate 120578

Then we modify the learning rate to 120578 = 1(iter +119888) for SGDClus where 119888 is a constant optimized in theexperiments In practice 119888 = 27855 for SGDClus runningon Syn3 and 119888 = 430245 for SGDClus running on Syn4The performance comparison of SGDClus with learning rate120578 = 1(iter + 119888) and SOSClus with learning rate 120578 = 1(iter +1) on Syn3 and Syn4 is shown in Figure 4 By employingthe optimized learning rate SGDClus converges to a localminimumquicklyHowever compared to SOSClus SGDClusstill has no advantageThe hand drawing blue circles over thecurve of SOSClus in Figure 4 shows that SOSClus can escapefrom a local minimum and find the global minimum whileSGDClus just obtains the first reaching local minimum

According to (23) and (24) we accessorily obtain the solu-tions of OPT by running SOSClus So we compare the ACand NMI of OPT SGDClus and SOSClus on the 4 syntheticdatasets which are shown in Figure 5 With the increase ofobject types in the heterogeneous information networks theAC and NMI of SOSClus and OPT increase distinctly whileperformance of SGDClus almost does not changeWhen 119879 =2 AC and NMI of these three methods on Syn1 and Syn2are almost equal and low However AC and NMI of SOSClusincrease to 1 when 119879 = 4 Since the histograms of OPT SGD-Clus and SOSClus on Syn1 and Syn2 are almost the same andon Syn3 and Syn4 are also similar we know that the param-eters 119870 and 119878 have no significant effect on the performanceGenerally the larger density119863 and the number of object types119879 in the network result in higher AC and NMI of SOSClus

Obviously in the experiments on the 4 synthetic datasetsSOSClus shows an excellent performance SOSClus has afaster convergence speed and better robustness with respectto the learning rate Meanwhile SOSClus performs better onAC and NMI because it can escape from a local minimumand find the global minimum

53 Performance Comparison on Real-World Dataset

531 Real-World Dataset Description The experiments onthe real-world dataset are used to compare the performanceof the tensor CP decomposition clustering framework withother state-of-the-art methods

The real-world dataset extracted from the DBLP databaseis the DBLP-four-area dataset which can be downloadedfrom httpwebcsuclaedusimyzsundataDBLP_four_areazip It is a four research-area subset of DBLP and is usedin [2 3 12 13 15 16 18] The four research areas in DBLP-four-area dataset are database (DB) data mining (DM)machine learning (ML) and information retrieval (IR)respectively There are five representative conferences ineach area And all related authors papers published in theseconferences and terms contained in these papersrsquo titlesare included The DBLP-four-area dataset contains 14376papers with 100 labeled 14475 authors with 4057 labeled20 labeled conferences and 8920 terms The density of theDBLP-four-area dataset is 901935 times 10minus9 so we construct a4-mode tensor with size of 14376 times 14475 times 20 times 8920 and334832 nonzero elements We compare the performance oftensor CP decomposition clustering framework with severalother methods on the labeled record in this dataset

532 Comparative Methods

(i) NetClus (see [2]) An extended version of RankClus [1]which can deal with the network follows the star networkschema The time complexity of NetClus for clustering each

10 Scientific Programming

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

08

Erro

rExperiments on Syn3

SGDClusSOSClus

Escaped from a local minimum

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

Erro

r

Experiments on Syn4

SGDClusSOSClus

Escaped from a local minimum

Figure 4 Convergence speed of SGDClus and SOSClus on Syn3 and Syn4 with different learning rate The learning rate for SOSClus is120578 = 1(iter + 1) while that for SGDClus is 120578 = 1(iter + 119888) where 119888 is a constant optimized in the experiments

OPTSGDClusSOSClus

Synthetic datasets

00

02

04

06

08

10

AC

Syn1 Syn2 Syn4Syn3

OPTSGDClusSOSClus

NM

I

00

02

04

06

08

10

Synthetic datasetsSyn1 Syn2 Syn4Syn3

Figure 5 The AC and NMI of OPT SGDClus and SOSClus on the 4 synthetic datasets

object type in each iteration is 119874(119870|119864| + (1198702 + 119870)119873) where119870 is the number of clusters |119864| is the number of edges in thenetwork and119873 is the total number of objects in the network

(ii) PathSelClus (see [15 16]) A clustering method based onthe predefined metapath requires a user guide In PathSel-Clus the distance between the same type objects is measuredby PathSim [3] and themethod starts with the given seeds byuser The time complexity of PathSelClus for clustering eachobject type in each iteration is 119874((119870 + 1)|P| + 119870119873) where|P| is the number of metapath instances in the networkAnd the time complexity of PathSim used by PathSelClus forclustering each object type is 119874(119873119889) where 119889 is the averagedegree of objects

(iii) FctClus (see [13]) It is a recently proposed clusteringmethod for heterogeneous information networks As withNetClus the FctClus method can deal with networks follow-ing the star network schema The time complexity of FctClusfor clustering each object type in each iteration is 119874(119870|119864| +119879119870119873)533 Experimental Results As the baseline methods canonly deal with specific schema heterogeneous informationnetworks here we must construct different subnetworks forthem For NetClus and FctClus the network is organized as astar network schema like in [2 13] where the paper (P) is thecentre type and author (A) conference (C) and term (T) are

Scientific Programming 11

Table 2 AC of experiments on DBLP-four-area dataset

AC OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 05882 08476 09007 07154 07551 07887Author 05872 08486 09486 07177 07951 08008Conference 1 099 1 09172 09950 09031avg (AC) 05892 08493 09477 07186 07951 08010

Table 3 NMI of experiments on DBLP-four-area dataset

NMI OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 06557 06720 08812 05402 06142 07152Author 06539 08872 08822 05488 06770 06012Conference 1 08497 1 08858 09906 08248avg (NMI) 06556 08778 08827 05503 06770 06050

Table 4 Running time of experiments on DBLP-four-area dataset

Running time (s) OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper mdash mdash mdash 8026 5423 8084Author mdash mdash mdash 7437 6811 7749Conference mdash mdash mdash 6584 6293 6698Total time 6726 4324 8184 22047 18527 22531

the attribute types For PathSelClus we select the metapathof P-T-P A-P-C-P-A and C-P-T-P-C to cluster the papersauthors and conferences respectively And in PathSelCluswe give each cluster one seed to start

Wemodel theDBLP-four-area dataset as a 4-mode tensorwhere each mode represents one object type The 4 modesare author (A) paper (P) conference (C) and term (T)respectively Actually the sequence of the object types isinsignificant And each element of the tensor represents agene-network in the heterogeneous information network Inthe experiments we set the learning rate for SOSClus to be120578 = 1(iter+1) and an optimized learning rate 120578 = 1(iter+119888)with 119888 = 1000125 for SGDClus By running SOSClus weaccessorily obtain the solutions of OPT So we compare theexperimental results of OPT SGDClus and SOSClus on theDBLP-four-area dataset with the three baseline methods Seethe details in Tables 2 3 and 4

In Tables 2 and 3 SOSClus performs best on average ACand NMI and SGDClus takes the second place All methodsachieve satisfactory AC and NMI on the conference sincethere are only 20 conferences in the network SGDClus takesthe shortest running time and OPT and SOSClus have anobvious advantage on running time compared with otherbaselinesThe time complexity of SGDClus is119874(119899119899119911(X)119873119870+(119879 minus 1)1198702119873) and the time complexity of SOSClus is119874(119899119899119911(X)119873119870+(119879minus1)1198702119873+1198703) where 119899119899119911(X) is the num-ber of nonzero elements in X that is the number of gene-networks in the heterogeneous information network Wehave 119899119899z(X) lt |P| ≪ |119864| 119870 ≪ 119873 and 119879 ≪ 119873 Comparedwith the time complexity of the three baselines SGDClus andSOSClus have a little disadvantage However it is worth not-ing that the three baselines can only cluster one type of objectsin the network in each running while OPT SGDClus and

SOSClus can obtain the clusters of all types of objects simulta-neously by running onceThis is the reasonwhy only the totaltime is shown for OPT SGDClus and SOSClus in Table 4Moreover the total time of OPT SGDClus and SOSClusis on the same order of magnitude as the running time ofother baselines for clustering each type of objects which isconsistent with the comparison of the time complexity

6 Conclusion

In this paper a tensor CP decomposition method for clus-tering heterogeneous information networks is presented Intensor CP decomposition clustering framework each type ofobjects in heterogeneous information network is modeled asone mode of tensor and the gene-networks in the networkare modeled as the elements in tensor In other wordstensor CP decomposition clustering framework can modeldifferent types of objects and semantic relations in the het-erogeneous information network without the restriction ofnetwork schema In addition two stochastic gradient descentalgorithms named SGDClus and SOSClus are designedSGDClus and SOSClus can cluster all types of objects andthe gene-networks simultaneously by running once Theproposed algorithms outperformed other state-of-the-artclustering methods in terms of AC NMI and running time

Notations119886 A scalar (lowercase letter)a A vector (boldface lowercase letter)A A matrix (boldface capital letter)X A tensor (calligraphic letter)

12 Scientific Programming

1199091198941 1198942119894119873 The (1198941 1198942 119894119873)th element of an 119873th-order tensorX

X(119899) Matricization ofX along the 119899th modelowast Hadamard product (element-wise prod-uct) of two tensors (or matrices or vectors)with the same dimension⊙ Khatri-Rao product of two matrices sdot 119865 Frobenius norm of a tensor (or matrices orvectors)

a ∘ b Outer product of two vectors

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This study was supported by the National Natural ScienceFoundation of China (no 61401482 and no 61401483)

References

[1] Y Sunt J Hant P Zhao Z Yin H Cheng and T WuldquoRankClus integrating clustering with ranking for heteroge-neous information network analysisrdquo in Proceedings of the 12thInternational Conference on Extending Database TechnologyAdvances inDatabase Technology (EDBT rsquo09) pp 565ndash576 SaintPetersburg Russia March 2009

[2] Y Sun Y Yu and J Han ldquoRanking-based clustering of hetero-geneous information networks with star network schemardquo inProceedings of the 15th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo09) pp 797ndash805 Paris France July 2009

[3] Y Sun J Han X Yan P S Yu and TWu ldquoPathSim meta path-based top-K similarity search in heterogeneous informationnetworksrdquo PVLDB vol 4 no 11 pp 992ndash1003 2011

[4] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability Volume1 Statistics pp 281ndash297 University of California Press BerkeleyCalif USA 1967

[5] S Guha R Rastogi and K Shim ldquoCure an efficient clusteringalgorithm for large databasesrdquo Information Systems vol 26 no1 pp 35ndash58 2001

[6] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the 2nd International Conference onKnowledge Discovery and Data Mining (KDD-96) E SimoudisJ Han and U M Fayyad Eds pp 226ndash231 AAAI PressPortland Ore USA 1996

[7] W Wang J Yang and R R Muntz ldquoSTING a statisticalinformation grid approach to spatial data miningrdquo in Pro-ceedings of the 23rd International Conference on Very LargeData Bases (VLDB rsquo97) M Jarke M J Carey K R DittrichF H Lochovsky P Loucopoulos and M A Jeusfeld Edspp 186ndash195 Morgan Kaufmann Athens Greece August 1997httpwwwvldborgconf1997P186PDF

[8] E H Ruspini ldquoNew experimental results in fuzzy clusteringrdquoInformation Sciences vol 6 no 73 pp 273ndash284 1973

[9] U von Luxburg ldquoA tutorial on spectral clusteringrdquo Statistics andComputing vol 17 no 4 pp 395ndash416 2007

[10] C Shi Y Li J Zhang Y Sun and P S Yu ldquoA survey of hetero-geneous information network analysisrdquo IEEE Transactions onKnowledge and Data Engineering vol 29 no 1 pp 17ndash37 2017

[11] J Han Y Sun X Yan and P S Yu ldquoMining knowledge fromdata an information network analysis approachrdquo in Proceedingsof the IEEE 28th International Conference on Data Engineering(ICDE rsquo12) pp 1214ndash1217 IEEE Washington DC USA April2012

[12] J Chen W Dai Y Sun and J Dy ldquoClustering and rankingin heterogeneous information networks via gamma-poissonmodelrdquo in Proceedings of the SIAM International Conference onData Mining (SDM rsquo15) pp 424ndash432 May 2015

[13] J Yang L Chen and J Zhang ldquoFctClus a fast clustering algo-rithm for heterogeneous information networksrdquoPLoSONE vol10 no 6 Article ID e0130086 2015

[14] C Shi R Wang Y Li P S Yu and B Wu ldquoRanking-basedclustering on general heterogeneous information networks bynetwork projectionrdquo in Proceedings of the 23rd ACM Interna-tional Conference on Conference on Information and KnowledgeManagement (CIKM rsquo14) pp 699ndash708 ACM Shanghai ChinaNovember 2014

[15] Y Sun B Norick J Han X Yan P S Yu and X Yu ldquoInte-grating meta-path selection with user-guided object clusteringin heterogeneous information networksrdquo in Proceedings of the18th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo12) pp 1348ndash1356 ACMBeijing China August 2012

[16] Y Sun B Norick J Han X Yan P S Yu and X YuldquoPathSelClus integrating meta-path selection with user-guidedObject clustering in heterogeneous information networksrdquoACMTransactions onKnowledgeDiscovery fromData vol 7 no3 pp 723ndash724 2013

[17] X Yu Y Sun B Norick T Mao and J Han ldquoUser guided entitysimilarity search using meta-path selection in heterogeneousinformation networksrdquo in Proceedings of the 21st ACM Interna-tional Conference on Information and Knowledge Management(CIKM rsquo12) pp 2025ndash2029 ACM November 2012

[18] Y Sun C C Aggarwal and J Han ldquoRelation strength-awareclustering of heterogeneous information networks with incom-plete attributesrdquoProceedings of the VLDBEndowment vol 5 no5 pp 394ndash405 2012

[19] M Zhang H Hu Z He andWWang ldquoTop-k similarity searchin heterogeneous information networks with x-star networkschemardquo Expert Systems with Applications vol 42 no 2 pp699ndash712 2015

[20] Y Zhou and L Liu ldquoSocial influence based clustering ofheterogeneous information networksrdquo in Proceedings of the19th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 338ndash346 ACM Chicago IllUSA August 2013

[21] L R Tucker ldquoSome mathematical notes on three-mode factoranalysisrdquo Psychometrika vol 31 pp 279ndash311 1966

[22] R A Harshman ldquoFoundations of the parafac procedure modeland conditions for an lsquoexplanatoryrsquo multi-mode factor analysisrdquoUCLAWorking Papers in Phonetics 1969

[23] H A L Kiers ldquoTowards a standardized notation and terminol-ogy in multiway analysisrdquo Journal of Chemometrics vol 14 no3 pp 105ndash122 2000

[24] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

Scientific Programming 13

[25] A Cichocki D Mandic L De Lathauwer et al ldquoTensordecompositions for signal processing applications from two-way to multiway component analysisrdquo IEEE Signal ProcessingMagazine vol 32 no 2 pp 145ndash163 2015

[26] W Peng and T Li ldquoTensor clustering via adaptive subspaceiterationrdquo Intelligent Data Analysis vol 15 no 5 pp 695ndash7132011

[27] J Hastad ldquoTensor rank is NP-completerdquo Journal of AlgorithmsCognition Informatics and Logic vol 11 no 4 pp 644ndash6541990

[28] S Metzler and P Miettinen ldquoClustering Boolean tensorsrdquo DataMining amp Knowledge Discovery vol 29 no 5 pp 1343ndash13732015

[29] X Cao X Wei Y Han and D Lin ldquoRobust face clustering viatensor decompositionrdquo IEEE Transactions on Cybernetics vol45 no 11 pp 2546ndash2557 2015

[30] I Sutskever R Salakhutdinov and J B Tenenbaum ldquoModellingrelational data using Bayesian clustered tensor factorizationrdquoin Proceedings of the 23rd Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo09) pp 1821ndash1828 BritishColumbia Canada December 2009

[31] B Ermis E Acar and A T Cemgil ldquoLink prediction in het-erogeneous data via generalized coupled tensor factorizationrdquoDataMining amp Knowledge Discovery vol 29 no 1 pp 203ndash2362015

[32] A R Benson D F Gleiche and J Leskovec ldquoTensor spectralclustering for partitioning higher-order network structuresrdquoin Proceedings of the SIAM International Conference on DataMining (SDM rsquo15) pp 118ndash126 Vancouver Canada May 2015

[33] L Xiong X Chen T-K Huang J G Schneider and JG Carbonell ldquoTemporal collaborative filtering with Bayesianprobabilistic tensor factorizationrdquo in Proceedings of the 10thSIAM International Conference on Data Mining (SDM rsquo10) pp211ndash222 Columbus Ohio USA May 2010

[34] E E Papalexakis L Akoglu and D Ience ldquoDo more views ofa graph help Community detection and clustering in multi-graphsrdquo in Proceedings of the 16th International Conferenceof Information Fusion (FUSION rsquo13) pp 899ndash905 IstanbulTurkey July 2013

[35] M Vandecappelle M Bousse F V Eeghem and L D Lath-auwer ldquoTensor decompositions for graph clusteringrdquo Inter-nal Report 16-170 ESAT-STADIUS KU Leuven LeuvenBelgium 2016 ftpftpesatkuleuvenbepubSISTAsistakul-akreports2016_Tensor_Graph_Clusteringpdf

[36] W Shao L He and P S Yu ldquoClustering on multi-sourceincomplete data via tensor modeling and factorizationrdquo inProceedings of the 19th Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD rsquo15) pp 485ndash497 Ho ChiMinh City Vietnam 2015

[37] J D Carroll and J-J Chang ldquoAnalysis of individual differencesin multidimensional scaling via an n-way generalization ofldquoEckart-Youngrdquo decompositionrdquo Psychometrika vol 35 no 3pp 283ndash319 1970

[38] L De Lathauwer B De Moor and J Vandewalle ldquoOn thebest rank-1 and rank-(R1 R2 RN) approximation ofhigher-order tensorsrdquo SIAM Journal on Matrix Analysis andApplications vol 21 no 4 pp 1324ndash1342 2000

[39] E Acar D M Dunlavy and T G Kolda ldquoA scalable opti-mization approach for fitting canonical tensor decompositionsrdquoJournal of Chemometrics vol 25 no 2 pp 67ndash86 2011

[40] S Hansen T Plantenga and T G Kolda ldquoNewton-basedoptimization for Kullback-Leibler nonnegative tensor factor-izationsrdquo Optimization Methods amp Software vol 30 no 5 pp1002ndash1029 2015

[41] NVervliet and LDe Lathauwer ldquoA randomized block samplingapproach to canonical polyadic decomposition of large-scaletensorsrdquo IEEE Journal on SelectedTopics in Signal Processing vol10 no 2 pp 284ndash295 2016

[42] R Ge F Huang C Jin and Y Yuan ldquoEscaping from saddlepointsmdashonline stochastic gradient for tensor decompositionrdquohttparxivorgabs150302101

[43] T G Kolda ldquoMultilinear operators for higher-order decompo-sitionsrdquo Tech Rep SAND2006-2081 Sandia National Labora-tories 2006 httpwwwostigovscitechbiblio923081=0pt

[44] P Paatero ldquoA weighted non-negative least squares algorithmfor three-way lsquoPARAFACrsquo factor analysisrdquo Chemometrics ampIntelligent Laboratory Systems vol 38 no 2 pp 223ndash242 1997

[45] P Paatero ldquoConstruction and analysis of degenerate PARAFACmodelsrdquo Journal of Chemometrics vol 14 no 3 pp 285ndash2992000

[46] B L Bottou and N Murata ldquoStochastic approximations andefficient learningrdquo inTheHandbook of BrainTheory and NeuralNetworks 2nd edition 2002

[47] B W Bader and T G Kolda ldquoEfficient MATLAB computationswith sparse and factored tensorsrdquo SIAM Journal on ScientificComputing vol 30 no 1 pp 205ndash231 2007

[48] A Strehl and J Ghosh ldquoCluster ensemblesmdasha knowledgereuse framework for combining multiple partitionsrdquo Journal ofMachine Learning Research vol 3 no 3 pp 583ndash617 2003

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: A Tensor CP Decomposition Method for Clustering ...downloads.hindawi.com/journals/sp/2017/2803091.pdf · spectral clustering, are designed to analyze homogeneous informationnetworks

Scientific Programming 7

we want to partition into 119870 clusters C1C2 C119870 Thecentre of the cluster C119896 is denoted by c119896 By using thecoordinate format [47] as the sparse representation ofX thegene-networks can be denoted as a matrix M isin R119899119899119911(X)times119879where 119899119899119911(X) is the number of nonzero elements inX Eachrow m119899 isin M 119899 = 1 2 119899119899119911(X) gives the subscripts ofcorresponding nonzero element in X In other words m119899

represents a gene-network and the entries 119898119899119905 isin m119899 119905 =1 2 119879 are the subscripts of the objects contained in thegene-network

The traditional clustering approach such as K-meansminimizes the sum of differences between individual gene-network in each cluster and the corresponding cluster centresthat is

min119901119899119896c119896

119899119899119911(X)sum119899=1

1003817100381710038171003817100381710038171003817100381710038171003817m119899 minus 119870sum119896=1

119901119899119896c11989610038171003817100381710038171003817100381710038171003817100381710038172

119865

(25)

where 119901119899119896 is the probability of gene-network m119899 belongingto the 119896th cluster Also we can rewrite the problem by anew perspective of clustering individual object in the gene-network as follows

min119901119899119905119896119888119896119905

119899119899119911(X)sum119899=1

119879sum119905=1

1003817100381710038171003817100381710038171003817100381710038171003817119898119899119905 minus119870sum119896=1

119901119899119905 11989611988811989611990510038171003817100381710038171003817100381710038171003817100381710038172

119865

(26)

where 119901119899119905 119896 is the probability of object V119905119899 belonging to the 119896thcluster

In the matrix form K-means can be formalized as

minPC

M minus PC2119865 (27)

where P is the cluster indication matrix and C is the clustercentres

By matricization of X along the 119905th mode the CPdecomposition ofX in (3) can be rewritten as

minU(1) U(2) U(119879)

100381710038171003817100381710038171003817X(119905) minus U(119905) (⊙(119905)U)⊤1003817100381710038171003817100381710038172119865 (28)

X(119905) is thematricization ofX along the 119905thmode andM is thesparse representation ofX LetU(119905) = P and let (⊙(119905)U)⊤ = CThat is U(119905) is the cluster indication matrix for the 119905th typeof objects and (⊙(119905)U)⊤ is the cluster centres So the CPdecomposition in (3) is equivalent to the K-means clusteringfor the 119905th type of objects in heterogeneous informationnetwork 119866 = (119881 119864)

By matricization of X in (3) along different modes wecan prove that the CP decomposition is equivalent to theK-means clustering for other types of objects So the CPdecomposition of X obtains the clustering of multitypedobjects in the heterogeneous information network119866 = (119881 119864)simultaneously

It is worth noting that the CP decomposition basedclustering is a soft clustering method the factor matricesindicate the probability of objects belonging to correspondingclusters The soft clustering is more in line with reality

because many objects in the heterogeneous informationnetworks may belong to several clusters In other wordsthe clusters are overlapping In some cases the overlappingclusters need to be translated into nonoverlapping clusterswhich can be achieved by using different approaches such asK-means to cluster the rows of factor matrices Usually thenonoverlapping clusters can be obtained by simply assigningeach object to the cluster which has the largest entry in thecorresponding row of factor matrix

45 Time Complexity Analysis Themain time consumptionof updating each factor matrix U(119905) in SGDClus is com-puting 120597L120597U(119905) According to (17) we need to calculateX(119905)(⊙(119905)U) and U(119905)Γ(119905) respectively Since U(119905) isin R119873119905times119870

and X(119905) isin R119873119905timesprod

119879

119894=1119894 =119905

119873119894

we have X(119905)(⊙(119905)U) isin R119873119905times119870Γ(119905) isin R119870times119870 and U(119905)Γ(119905)isinR119873119905times119870 Firstly if we successively calculate the Khatri-Rao prod-

uct of 119879 minus 1 matrices and a matrix-matrix multiplicationwhen computing X(119905)(⊙(119905)U) the intermediate results willbe of very large size and the computational cost will bevery expensive In practice we can reduce the complexityby ignoring the unnecessary calculation Let us observe theelement ofX(119905)(⊙(119905)U) that is

(X(119905) (⊙(119905)U))119899119905 119896

= sum119899119894119879

119894=1119894 =119905

(119909119899119905 prod119879119894=1119894 =119905

119899119894

119879prod119894=1119894 =119905

119906(119894)119899119894 119896) (29)

119909119899119905 prod119879119894=1119894 =119905

119899119894is an element in the matricization ofX which rep-

resents a corresponding gene-network in the heterogeneousinformation networkWhen 119909119899119905 prod119879119894=1

119894 =119905

119899119894= 0 we can ignore the

following Khatri-Rao product Hence only nonzero elementsinX need to be computedTherefore the time complexity forcomputingX(119905)(⊙(119905)U) is 119874(119899119899119911(X)119873119905119870)

Secondly the element of U(119905)Γ(119905) is given by

(U(119905)Γ(119905))119899119905 119896

= 119870sum119895=1

(119906(119905)119899119905 119895 119879prod119894=1119894 =119905

119873119894sum119899119894=1

119906(119894)119899119894 119895119906(119894)119899119894 119896) (30)

So the time complexity of computing U(119905)Γ(119905) is 119874((119873 minus119873119905)1198702) where 119873 = sum119879119905=1119873119905 is the total number of objectsin the networks

Above all the time complexity of each iteration inSGDClus is 119874(119899119899119911(X)119873119870 + (119879 minus 1)1198731198702) Note that in thereal-world heterogeneous information networks the numberof clusters119870 and the number of object types 119879 are usually farless than119873 that is119870 ≪ 119873 and 119879 ≪ 119873

According to (22) the time complexity of updatingeach factor matrix U(119905) in SOSClus is composed of threecomponents X(119905)(⊙(119905)U) (Γ(119905) + 120582I)minus1 and the product ofthem Compared to SGDClus only the time consumption ofcomputing an inverse matrix for (Γ(119905) + 120582I) is additional in

8 Scientific Programming

SOSClus Since (Γ(119905)+120582I) is a119870times119870 squarematrix computingthe inverse of such matrix costs 119874(1198703) Nevertheless 119874(1198703)is usually negligible because119870 ≪ 119873

5 Experiments and Results

In this section we present several experiments on syntheticand real-world datasets for heterogeneous information net-works and compare the performance with a number of state-of-the-art clustering methods

51 Evaluation Metrics and Experimental Setting

511 Evaluation Metrics In the experiments we adopt theNormalized Mutual Information (NMI) [48] and Accuracy(AC) as our performance measurements

NMI is used to measure the mutual dependence infor-mation between the clustering result and the ground truthGiven 119873 objects 119870 clusters one clustering result andthe ground truth classes for the objects let 119899(119894 119895) 119894 119895 =1 2 119870 be the number of objects that labeled 119894 in clus-tering result but labeled 119895 in the ground truth The jointdistribution can be defined as119901(119894 119895) = 119899(119894 119895)119873 themarginaldistribution of rows can be calculated as 1199011(119895) = sum119870119894=1 119901(119894 119895)and themarginal distribution of columns can be calculated as1199012(119894) = sum119870119895=1 119901(119894 119895) Then the NMI is defined as

NMI = sum119870119894=1sum119870119895=1 119901 (119894 119895) log (119901 (119894 119895) 1199011 (119895) 1199012 (119894))radicsum119870119895=1 1199011 (119895) log1199011 (119895)sum119870119894=1 1199012 (119894) log1199012 (119894) (31)

The NMI ranges from 0 to 1 the larger value of NMI thebetter the clustering result

AC is used to compute the clustering accuracy thatmeasures the percent of the correct clustering result AC isdefined as

AC = sum119879119905=1sum119873119905119899=1 120575 (map (120592119905119899) label (120592119905119899))sum119879119905=1119873119905 (32)

where map(120592119905119899) is the cluster label of the object 120592119905119899 and thelabel(120592119905119899) is the ground truth class of the object 120592119905119899 And 120575(sdot)is an indicator function

120575 (sdot) = 1 if map (120592119905119899) = label (120592119905119899) 0 if map (120592119905119899) = label (120592119905119899) (33)

Since both ofNMI andACare used tomeasure the perfor-mance of clustering one type of object the weighted averageNMI and AC are also used to measure the performance ofSTFClus and other state-of-the-art methods

NMI = sum119879119905=1119873119905 (NMI)119905sum119879119905=1119873119905 AC = sum119879119905=1119873119905 (AC)119905sum119879119905=1119873119905 (34)

Table 1 The synthetic datasets

Syntheticdatasets 119879 119870 119878 119863Syn1 2 2 1119872 = 1000 times 1000 01Syn2 2 4 10119872 = 1000 times 10000 001Syn3 4 2 100119872 = 100 times 100 times 100 times 100 01Syn4 4 4 1000119872 = 100 times 100 times 100 times 1000 001119879 is the number of object types in the heterogeneous information networkand also the number of modes in the tensor119870 is the number of clusters 119878 isthe network scale and 119878 = 1198731 times1198732 times sdot sdot sdot times119873119879119863 is the density of the tensorthat is the percentage of nonzero elements in the tensor and119863 = 119899119899119911(X)119878

512 Experimental Setting In order to compare the perfor-mance of our proposed SGDClus and SOSClus with othersimpartially all methods share a common stopping conditionthat is 1003816100381610038161003816Liter minusLiterminus1

1003816100381610038161003816Literminus1

le 10minus6 (35)

or iter reaches the maximum iterations Liter and Literminus1are the values of function L at the current that is (iter)thiteration and the previous that is (iter minus 1)th iterationrespectively And we set the maximum iterations to be 1000Throughout the experiments the regularization parameter 120582in SGDClus and SOSClus is fixed as 120582 = 0001

All experiments are implemented in theMATLABR2015a(version 850) 64-bit And the MATLAB Tensor Toolbox(version 26 httpwwwsandiagovsimtgkoldaTensorTool-box) is used in our experiments Since the heterogeneousinformation networks are often sparse in real-world scenar-ios that is most elements in the tensor X are zeros we usethe sparse format of X as proposed in [47] which has beensupported by MATLAB Tensor Toolbox The experimentalresults are the average values obtained by running thealgorithms ten times on corresponding datasets

52 Experiments on Synthetic Datasets

521 The Synthetic Datasets Description The purposeof using synthetic datasets is to examine whether theproposed tensor CP decomposition clustering frameworkcan work well since the detailed cluster structures of thesynthetic datasets are known In order to make the syntheticdatasets similar to a realistic situation we assume thatthe distribution for different types of objects that appearin a gene-network follows Zipf rsquos law (see details onlinehttpsenwikipediaorgwikiZipf rsquos_law) Zipf rsquos law isdefined by119891(119903 120588119873) = 119903minus120588sum119873119899=1 119899minus120588 where119873 is the numberof objects 119903 is the object index and 120588 is the parameter charac-terizing the distribution Zipf rsquos law denotes the frequency ofthe 119903th object appearing in the gene-networkWe set 120588 = 095and generate 4 synthetic datasets with different parametersThe details of these synthetic datasets are shown in Table 1

522 Experimental Results In the beginning of the experi-ments we set a common learning rate 120578 = 1(iter + 1) for

Scientific Programming 9

SGDClus on synthetic datasets

0 10 20 30 40 50

Iteration

Erro

r

Syn1Syn2

Syn3Syn4

000

005

010

015

020

025

030

035

SOSClus on synthetic datasets

0 10 20 30 40 50

Iteration

Syn1Syn2

Syn3Syn4

00

01

02

03

04

05

06

07

08

Erro

r

Figure 3 Convergence speed of SGDClus and SOSClus on the 4 synthetic datasets with a common learning rate that is 120578 = 1(iter + 1)SGDClus and SOSClus We find that SOSClus has a fasterconvergence speed and better robustness with respect to thelearning rate which is clearly shown in Figure 3 AlthoughSGDClus may eventually converge to a local minimum theefficiency of the optimization near a local minimum is not allroses As shown in Figure 3 the solutions of SGDClus swingaround a local minimum This phenomenon proves that theconvergence speed of SGDClus is sensitive to the choice oflearning rate 120578

Then we modify the learning rate to 120578 = 1(iter +119888) for SGDClus where 119888 is a constant optimized in theexperiments In practice 119888 = 27855 for SGDClus runningon Syn3 and 119888 = 430245 for SGDClus running on Syn4The performance comparison of SGDClus with learning rate120578 = 1(iter + 119888) and SOSClus with learning rate 120578 = 1(iter +1) on Syn3 and Syn4 is shown in Figure 4 By employingthe optimized learning rate SGDClus converges to a localminimumquicklyHowever compared to SOSClus SGDClusstill has no advantageThe hand drawing blue circles over thecurve of SOSClus in Figure 4 shows that SOSClus can escapefrom a local minimum and find the global minimum whileSGDClus just obtains the first reaching local minimum

According to (23) and (24) we accessorily obtain the solu-tions of OPT by running SOSClus So we compare the ACand NMI of OPT SGDClus and SOSClus on the 4 syntheticdatasets which are shown in Figure 5 With the increase ofobject types in the heterogeneous information networks theAC and NMI of SOSClus and OPT increase distinctly whileperformance of SGDClus almost does not changeWhen 119879 =2 AC and NMI of these three methods on Syn1 and Syn2are almost equal and low However AC and NMI of SOSClusincrease to 1 when 119879 = 4 Since the histograms of OPT SGD-Clus and SOSClus on Syn1 and Syn2 are almost the same andon Syn3 and Syn4 are also similar we know that the param-eters 119870 and 119878 have no significant effect on the performanceGenerally the larger density119863 and the number of object types119879 in the network result in higher AC and NMI of SOSClus

Obviously in the experiments on the 4 synthetic datasetsSOSClus shows an excellent performance SOSClus has afaster convergence speed and better robustness with respectto the learning rate Meanwhile SOSClus performs better onAC and NMI because it can escape from a local minimumand find the global minimum

53 Performance Comparison on Real-World Dataset

531 Real-World Dataset Description The experiments onthe real-world dataset are used to compare the performanceof the tensor CP decomposition clustering framework withother state-of-the-art methods

The real-world dataset extracted from the DBLP databaseis the DBLP-four-area dataset which can be downloadedfrom httpwebcsuclaedusimyzsundataDBLP_four_areazip It is a four research-area subset of DBLP and is usedin [2 3 12 13 15 16 18] The four research areas in DBLP-four-area dataset are database (DB) data mining (DM)machine learning (ML) and information retrieval (IR)respectively There are five representative conferences ineach area And all related authors papers published in theseconferences and terms contained in these papersrsquo titlesare included The DBLP-four-area dataset contains 14376papers with 100 labeled 14475 authors with 4057 labeled20 labeled conferences and 8920 terms The density of theDBLP-four-area dataset is 901935 times 10minus9 so we construct a4-mode tensor with size of 14376 times 14475 times 20 times 8920 and334832 nonzero elements We compare the performance oftensor CP decomposition clustering framework with severalother methods on the labeled record in this dataset

532 Comparative Methods

(i) NetClus (see [2]) An extended version of RankClus [1]which can deal with the network follows the star networkschema The time complexity of NetClus for clustering each

10 Scientific Programming

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

08

Erro

rExperiments on Syn3

SGDClusSOSClus

Escaped from a local minimum

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

Erro

r

Experiments on Syn4

SGDClusSOSClus

Escaped from a local minimum

Figure 4 Convergence speed of SGDClus and SOSClus on Syn3 and Syn4 with different learning rate The learning rate for SOSClus is120578 = 1(iter + 1) while that for SGDClus is 120578 = 1(iter + 119888) where 119888 is a constant optimized in the experiments

OPTSGDClusSOSClus

Synthetic datasets

00

02

04

06

08

10

AC

Syn1 Syn2 Syn4Syn3

OPTSGDClusSOSClus

NM

I

00

02

04

06

08

10

Synthetic datasetsSyn1 Syn2 Syn4Syn3

Figure 5 The AC and NMI of OPT SGDClus and SOSClus on the 4 synthetic datasets

object type in each iteration is 119874(119870|119864| + (1198702 + 119870)119873) where119870 is the number of clusters |119864| is the number of edges in thenetwork and119873 is the total number of objects in the network

(ii) PathSelClus (see [15 16]) A clustering method based onthe predefined metapath requires a user guide In PathSel-Clus the distance between the same type objects is measuredby PathSim [3] and themethod starts with the given seeds byuser The time complexity of PathSelClus for clustering eachobject type in each iteration is 119874((119870 + 1)|P| + 119870119873) where|P| is the number of metapath instances in the networkAnd the time complexity of PathSim used by PathSelClus forclustering each object type is 119874(119873119889) where 119889 is the averagedegree of objects

(iii) FctClus (see [13]) It is a recently proposed clusteringmethod for heterogeneous information networks As withNetClus the FctClus method can deal with networks follow-ing the star network schema The time complexity of FctClusfor clustering each object type in each iteration is 119874(119870|119864| +119879119870119873)533 Experimental Results As the baseline methods canonly deal with specific schema heterogeneous informationnetworks here we must construct different subnetworks forthem For NetClus and FctClus the network is organized as astar network schema like in [2 13] where the paper (P) is thecentre type and author (A) conference (C) and term (T) are

Scientific Programming 11

Table 2 AC of experiments on DBLP-four-area dataset

AC OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 05882 08476 09007 07154 07551 07887Author 05872 08486 09486 07177 07951 08008Conference 1 099 1 09172 09950 09031avg (AC) 05892 08493 09477 07186 07951 08010

Table 3 NMI of experiments on DBLP-four-area dataset

NMI OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 06557 06720 08812 05402 06142 07152Author 06539 08872 08822 05488 06770 06012Conference 1 08497 1 08858 09906 08248avg (NMI) 06556 08778 08827 05503 06770 06050

Table 4 Running time of experiments on DBLP-four-area dataset

Running time (s) OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper mdash mdash mdash 8026 5423 8084Author mdash mdash mdash 7437 6811 7749Conference mdash mdash mdash 6584 6293 6698Total time 6726 4324 8184 22047 18527 22531

the attribute types For PathSelClus we select the metapathof P-T-P A-P-C-P-A and C-P-T-P-C to cluster the papersauthors and conferences respectively And in PathSelCluswe give each cluster one seed to start

Wemodel theDBLP-four-area dataset as a 4-mode tensorwhere each mode represents one object type The 4 modesare author (A) paper (P) conference (C) and term (T)respectively Actually the sequence of the object types isinsignificant And each element of the tensor represents agene-network in the heterogeneous information network Inthe experiments we set the learning rate for SOSClus to be120578 = 1(iter+1) and an optimized learning rate 120578 = 1(iter+119888)with 119888 = 1000125 for SGDClus By running SOSClus weaccessorily obtain the solutions of OPT So we compare theexperimental results of OPT SGDClus and SOSClus on theDBLP-four-area dataset with the three baseline methods Seethe details in Tables 2 3 and 4

In Tables 2 and 3 SOSClus performs best on average ACand NMI and SGDClus takes the second place All methodsachieve satisfactory AC and NMI on the conference sincethere are only 20 conferences in the network SGDClus takesthe shortest running time and OPT and SOSClus have anobvious advantage on running time compared with otherbaselinesThe time complexity of SGDClus is119874(119899119899119911(X)119873119870+(119879 minus 1)1198702119873) and the time complexity of SOSClus is119874(119899119899119911(X)119873119870+(119879minus1)1198702119873+1198703) where 119899119899119911(X) is the num-ber of nonzero elements in X that is the number of gene-networks in the heterogeneous information network Wehave 119899119899z(X) lt |P| ≪ |119864| 119870 ≪ 119873 and 119879 ≪ 119873 Comparedwith the time complexity of the three baselines SGDClus andSOSClus have a little disadvantage However it is worth not-ing that the three baselines can only cluster one type of objectsin the network in each running while OPT SGDClus and

SOSClus can obtain the clusters of all types of objects simulta-neously by running onceThis is the reasonwhy only the totaltime is shown for OPT SGDClus and SOSClus in Table 4Moreover the total time of OPT SGDClus and SOSClusis on the same order of magnitude as the running time ofother baselines for clustering each type of objects which isconsistent with the comparison of the time complexity

6 Conclusion

In this paper a tensor CP decomposition method for clus-tering heterogeneous information networks is presented Intensor CP decomposition clustering framework each type ofobjects in heterogeneous information network is modeled asone mode of tensor and the gene-networks in the networkare modeled as the elements in tensor In other wordstensor CP decomposition clustering framework can modeldifferent types of objects and semantic relations in the het-erogeneous information network without the restriction ofnetwork schema In addition two stochastic gradient descentalgorithms named SGDClus and SOSClus are designedSGDClus and SOSClus can cluster all types of objects andthe gene-networks simultaneously by running once Theproposed algorithms outperformed other state-of-the-artclustering methods in terms of AC NMI and running time

Notations119886 A scalar (lowercase letter)a A vector (boldface lowercase letter)A A matrix (boldface capital letter)X A tensor (calligraphic letter)

12 Scientific Programming

1199091198941 1198942119894119873 The (1198941 1198942 119894119873)th element of an 119873th-order tensorX

X(119899) Matricization ofX along the 119899th modelowast Hadamard product (element-wise prod-uct) of two tensors (or matrices or vectors)with the same dimension⊙ Khatri-Rao product of two matrices sdot 119865 Frobenius norm of a tensor (or matrices orvectors)

a ∘ b Outer product of two vectors

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This study was supported by the National Natural ScienceFoundation of China (no 61401482 and no 61401483)

References

[1] Y Sunt J Hant P Zhao Z Yin H Cheng and T WuldquoRankClus integrating clustering with ranking for heteroge-neous information network analysisrdquo in Proceedings of the 12thInternational Conference on Extending Database TechnologyAdvances inDatabase Technology (EDBT rsquo09) pp 565ndash576 SaintPetersburg Russia March 2009

[2] Y Sun Y Yu and J Han ldquoRanking-based clustering of hetero-geneous information networks with star network schemardquo inProceedings of the 15th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo09) pp 797ndash805 Paris France July 2009

[3] Y Sun J Han X Yan P S Yu and TWu ldquoPathSim meta path-based top-K similarity search in heterogeneous informationnetworksrdquo PVLDB vol 4 no 11 pp 992ndash1003 2011

[4] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability Volume1 Statistics pp 281ndash297 University of California Press BerkeleyCalif USA 1967

[5] S Guha R Rastogi and K Shim ldquoCure an efficient clusteringalgorithm for large databasesrdquo Information Systems vol 26 no1 pp 35ndash58 2001

[6] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the 2nd International Conference onKnowledge Discovery and Data Mining (KDD-96) E SimoudisJ Han and U M Fayyad Eds pp 226ndash231 AAAI PressPortland Ore USA 1996

[7] W Wang J Yang and R R Muntz ldquoSTING a statisticalinformation grid approach to spatial data miningrdquo in Pro-ceedings of the 23rd International Conference on Very LargeData Bases (VLDB rsquo97) M Jarke M J Carey K R DittrichF H Lochovsky P Loucopoulos and M A Jeusfeld Edspp 186ndash195 Morgan Kaufmann Athens Greece August 1997httpwwwvldborgconf1997P186PDF

[8] E H Ruspini ldquoNew experimental results in fuzzy clusteringrdquoInformation Sciences vol 6 no 73 pp 273ndash284 1973

[9] U von Luxburg ldquoA tutorial on spectral clusteringrdquo Statistics andComputing vol 17 no 4 pp 395ndash416 2007

[10] C Shi Y Li J Zhang Y Sun and P S Yu ldquoA survey of hetero-geneous information network analysisrdquo IEEE Transactions onKnowledge and Data Engineering vol 29 no 1 pp 17ndash37 2017

[11] J Han Y Sun X Yan and P S Yu ldquoMining knowledge fromdata an information network analysis approachrdquo in Proceedingsof the IEEE 28th International Conference on Data Engineering(ICDE rsquo12) pp 1214ndash1217 IEEE Washington DC USA April2012

[12] J Chen W Dai Y Sun and J Dy ldquoClustering and rankingin heterogeneous information networks via gamma-poissonmodelrdquo in Proceedings of the SIAM International Conference onData Mining (SDM rsquo15) pp 424ndash432 May 2015

[13] J Yang L Chen and J Zhang ldquoFctClus a fast clustering algo-rithm for heterogeneous information networksrdquoPLoSONE vol10 no 6 Article ID e0130086 2015

[14] C Shi R Wang Y Li P S Yu and B Wu ldquoRanking-basedclustering on general heterogeneous information networks bynetwork projectionrdquo in Proceedings of the 23rd ACM Interna-tional Conference on Conference on Information and KnowledgeManagement (CIKM rsquo14) pp 699ndash708 ACM Shanghai ChinaNovember 2014

[15] Y Sun B Norick J Han X Yan P S Yu and X Yu ldquoInte-grating meta-path selection with user-guided object clusteringin heterogeneous information networksrdquo in Proceedings of the18th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo12) pp 1348ndash1356 ACMBeijing China August 2012

[16] Y Sun B Norick J Han X Yan P S Yu and X YuldquoPathSelClus integrating meta-path selection with user-guidedObject clustering in heterogeneous information networksrdquoACMTransactions onKnowledgeDiscovery fromData vol 7 no3 pp 723ndash724 2013

[17] X Yu Y Sun B Norick T Mao and J Han ldquoUser guided entitysimilarity search using meta-path selection in heterogeneousinformation networksrdquo in Proceedings of the 21st ACM Interna-tional Conference on Information and Knowledge Management(CIKM rsquo12) pp 2025ndash2029 ACM November 2012

[18] Y Sun C C Aggarwal and J Han ldquoRelation strength-awareclustering of heterogeneous information networks with incom-plete attributesrdquoProceedings of the VLDBEndowment vol 5 no5 pp 394ndash405 2012

[19] M Zhang H Hu Z He andWWang ldquoTop-k similarity searchin heterogeneous information networks with x-star networkschemardquo Expert Systems with Applications vol 42 no 2 pp699ndash712 2015

[20] Y Zhou and L Liu ldquoSocial influence based clustering ofheterogeneous information networksrdquo in Proceedings of the19th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 338ndash346 ACM Chicago IllUSA August 2013

[21] L R Tucker ldquoSome mathematical notes on three-mode factoranalysisrdquo Psychometrika vol 31 pp 279ndash311 1966

[22] R A Harshman ldquoFoundations of the parafac procedure modeland conditions for an lsquoexplanatoryrsquo multi-mode factor analysisrdquoUCLAWorking Papers in Phonetics 1969

[23] H A L Kiers ldquoTowards a standardized notation and terminol-ogy in multiway analysisrdquo Journal of Chemometrics vol 14 no3 pp 105ndash122 2000

[24] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

Scientific Programming 13

[25] A Cichocki D Mandic L De Lathauwer et al ldquoTensordecompositions for signal processing applications from two-way to multiway component analysisrdquo IEEE Signal ProcessingMagazine vol 32 no 2 pp 145ndash163 2015

[26] W Peng and T Li ldquoTensor clustering via adaptive subspaceiterationrdquo Intelligent Data Analysis vol 15 no 5 pp 695ndash7132011

[27] J Hastad ldquoTensor rank is NP-completerdquo Journal of AlgorithmsCognition Informatics and Logic vol 11 no 4 pp 644ndash6541990

[28] S Metzler and P Miettinen ldquoClustering Boolean tensorsrdquo DataMining amp Knowledge Discovery vol 29 no 5 pp 1343ndash13732015

[29] X Cao X Wei Y Han and D Lin ldquoRobust face clustering viatensor decompositionrdquo IEEE Transactions on Cybernetics vol45 no 11 pp 2546ndash2557 2015

[30] I Sutskever R Salakhutdinov and J B Tenenbaum ldquoModellingrelational data using Bayesian clustered tensor factorizationrdquoin Proceedings of the 23rd Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo09) pp 1821ndash1828 BritishColumbia Canada December 2009

[31] B Ermis E Acar and A T Cemgil ldquoLink prediction in het-erogeneous data via generalized coupled tensor factorizationrdquoDataMining amp Knowledge Discovery vol 29 no 1 pp 203ndash2362015

[32] A R Benson D F Gleiche and J Leskovec ldquoTensor spectralclustering for partitioning higher-order network structuresrdquoin Proceedings of the SIAM International Conference on DataMining (SDM rsquo15) pp 118ndash126 Vancouver Canada May 2015

[33] L Xiong X Chen T-K Huang J G Schneider and JG Carbonell ldquoTemporal collaborative filtering with Bayesianprobabilistic tensor factorizationrdquo in Proceedings of the 10thSIAM International Conference on Data Mining (SDM rsquo10) pp211ndash222 Columbus Ohio USA May 2010

[34] E E Papalexakis L Akoglu and D Ience ldquoDo more views ofa graph help Community detection and clustering in multi-graphsrdquo in Proceedings of the 16th International Conferenceof Information Fusion (FUSION rsquo13) pp 899ndash905 IstanbulTurkey July 2013

[35] M Vandecappelle M Bousse F V Eeghem and L D Lath-auwer ldquoTensor decompositions for graph clusteringrdquo Inter-nal Report 16-170 ESAT-STADIUS KU Leuven LeuvenBelgium 2016 ftpftpesatkuleuvenbepubSISTAsistakul-akreports2016_Tensor_Graph_Clusteringpdf

[36] W Shao L He and P S Yu ldquoClustering on multi-sourceincomplete data via tensor modeling and factorizationrdquo inProceedings of the 19th Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD rsquo15) pp 485ndash497 Ho ChiMinh City Vietnam 2015

[37] J D Carroll and J-J Chang ldquoAnalysis of individual differencesin multidimensional scaling via an n-way generalization ofldquoEckart-Youngrdquo decompositionrdquo Psychometrika vol 35 no 3pp 283ndash319 1970

[38] L De Lathauwer B De Moor and J Vandewalle ldquoOn thebest rank-1 and rank-(R1 R2 RN) approximation ofhigher-order tensorsrdquo SIAM Journal on Matrix Analysis andApplications vol 21 no 4 pp 1324ndash1342 2000

[39] E Acar D M Dunlavy and T G Kolda ldquoA scalable opti-mization approach for fitting canonical tensor decompositionsrdquoJournal of Chemometrics vol 25 no 2 pp 67ndash86 2011

[40] S Hansen T Plantenga and T G Kolda ldquoNewton-basedoptimization for Kullback-Leibler nonnegative tensor factor-izationsrdquo Optimization Methods amp Software vol 30 no 5 pp1002ndash1029 2015

[41] NVervliet and LDe Lathauwer ldquoA randomized block samplingapproach to canonical polyadic decomposition of large-scaletensorsrdquo IEEE Journal on SelectedTopics in Signal Processing vol10 no 2 pp 284ndash295 2016

[42] R Ge F Huang C Jin and Y Yuan ldquoEscaping from saddlepointsmdashonline stochastic gradient for tensor decompositionrdquohttparxivorgabs150302101

[43] T G Kolda ldquoMultilinear operators for higher-order decompo-sitionsrdquo Tech Rep SAND2006-2081 Sandia National Labora-tories 2006 httpwwwostigovscitechbiblio923081=0pt

[44] P Paatero ldquoA weighted non-negative least squares algorithmfor three-way lsquoPARAFACrsquo factor analysisrdquo Chemometrics ampIntelligent Laboratory Systems vol 38 no 2 pp 223ndash242 1997

[45] P Paatero ldquoConstruction and analysis of degenerate PARAFACmodelsrdquo Journal of Chemometrics vol 14 no 3 pp 285ndash2992000

[46] B L Bottou and N Murata ldquoStochastic approximations andefficient learningrdquo inTheHandbook of BrainTheory and NeuralNetworks 2nd edition 2002

[47] B W Bader and T G Kolda ldquoEfficient MATLAB computationswith sparse and factored tensorsrdquo SIAM Journal on ScientificComputing vol 30 no 1 pp 205ndash231 2007

[48] A Strehl and J Ghosh ldquoCluster ensemblesmdasha knowledgereuse framework for combining multiple partitionsrdquo Journal ofMachine Learning Research vol 3 no 3 pp 583ndash617 2003

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: A Tensor CP Decomposition Method for Clustering ...downloads.hindawi.com/journals/sp/2017/2803091.pdf · spectral clustering, are designed to analyze homogeneous informationnetworks

8 Scientific Programming

SOSClus Since (Γ(119905)+120582I) is a119870times119870 squarematrix computingthe inverse of such matrix costs 119874(1198703) Nevertheless 119874(1198703)is usually negligible because119870 ≪ 119873

5 Experiments and Results

In this section we present several experiments on syntheticand real-world datasets for heterogeneous information net-works and compare the performance with a number of state-of-the-art clustering methods

51 Evaluation Metrics and Experimental Setting

511 Evaluation Metrics In the experiments we adopt theNormalized Mutual Information (NMI) [48] and Accuracy(AC) as our performance measurements

NMI is used to measure the mutual dependence infor-mation between the clustering result and the ground truthGiven 119873 objects 119870 clusters one clustering result andthe ground truth classes for the objects let 119899(119894 119895) 119894 119895 =1 2 119870 be the number of objects that labeled 119894 in clus-tering result but labeled 119895 in the ground truth The jointdistribution can be defined as119901(119894 119895) = 119899(119894 119895)119873 themarginaldistribution of rows can be calculated as 1199011(119895) = sum119870119894=1 119901(119894 119895)and themarginal distribution of columns can be calculated as1199012(119894) = sum119870119895=1 119901(119894 119895) Then the NMI is defined as

NMI = sum119870119894=1sum119870119895=1 119901 (119894 119895) log (119901 (119894 119895) 1199011 (119895) 1199012 (119894))radicsum119870119895=1 1199011 (119895) log1199011 (119895)sum119870119894=1 1199012 (119894) log1199012 (119894) (31)

The NMI ranges from 0 to 1 the larger value of NMI thebetter the clustering result

AC is used to compute the clustering accuracy thatmeasures the percent of the correct clustering result AC isdefined as

AC = sum119879119905=1sum119873119905119899=1 120575 (map (120592119905119899) label (120592119905119899))sum119879119905=1119873119905 (32)

where map(120592119905119899) is the cluster label of the object 120592119905119899 and thelabel(120592119905119899) is the ground truth class of the object 120592119905119899 And 120575(sdot)is an indicator function

120575 (sdot) = 1 if map (120592119905119899) = label (120592119905119899) 0 if map (120592119905119899) = label (120592119905119899) (33)

Since both ofNMI andACare used tomeasure the perfor-mance of clustering one type of object the weighted averageNMI and AC are also used to measure the performance ofSTFClus and other state-of-the-art methods

NMI = sum119879119905=1119873119905 (NMI)119905sum119879119905=1119873119905 AC = sum119879119905=1119873119905 (AC)119905sum119879119905=1119873119905 (34)

Table 1 The synthetic datasets

Syntheticdatasets 119879 119870 119878 119863Syn1 2 2 1119872 = 1000 times 1000 01Syn2 2 4 10119872 = 1000 times 10000 001Syn3 4 2 100119872 = 100 times 100 times 100 times 100 01Syn4 4 4 1000119872 = 100 times 100 times 100 times 1000 001119879 is the number of object types in the heterogeneous information networkand also the number of modes in the tensor119870 is the number of clusters 119878 isthe network scale and 119878 = 1198731 times1198732 times sdot sdot sdot times119873119879119863 is the density of the tensorthat is the percentage of nonzero elements in the tensor and119863 = 119899119899119911(X)119878

512 Experimental Setting In order to compare the perfor-mance of our proposed SGDClus and SOSClus with othersimpartially all methods share a common stopping conditionthat is 1003816100381610038161003816Liter minusLiterminus1

1003816100381610038161003816Literminus1

le 10minus6 (35)

or iter reaches the maximum iterations Liter and Literminus1are the values of function L at the current that is (iter)thiteration and the previous that is (iter minus 1)th iterationrespectively And we set the maximum iterations to be 1000Throughout the experiments the regularization parameter 120582in SGDClus and SOSClus is fixed as 120582 = 0001

All experiments are implemented in theMATLABR2015a(version 850) 64-bit And the MATLAB Tensor Toolbox(version 26 httpwwwsandiagovsimtgkoldaTensorTool-box) is used in our experiments Since the heterogeneousinformation networks are often sparse in real-world scenar-ios that is most elements in the tensor X are zeros we usethe sparse format of X as proposed in [47] which has beensupported by MATLAB Tensor Toolbox The experimentalresults are the average values obtained by running thealgorithms ten times on corresponding datasets

52 Experiments on Synthetic Datasets

521 The Synthetic Datasets Description The purposeof using synthetic datasets is to examine whether theproposed tensor CP decomposition clustering frameworkcan work well since the detailed cluster structures of thesynthetic datasets are known In order to make the syntheticdatasets similar to a realistic situation we assume thatthe distribution for different types of objects that appearin a gene-network follows Zipf rsquos law (see details onlinehttpsenwikipediaorgwikiZipf rsquos_law) Zipf rsquos law isdefined by119891(119903 120588119873) = 119903minus120588sum119873119899=1 119899minus120588 where119873 is the numberof objects 119903 is the object index and 120588 is the parameter charac-terizing the distribution Zipf rsquos law denotes the frequency ofthe 119903th object appearing in the gene-networkWe set 120588 = 095and generate 4 synthetic datasets with different parametersThe details of these synthetic datasets are shown in Table 1

522 Experimental Results In the beginning of the experi-ments we set a common learning rate 120578 = 1(iter + 1) for

Scientific Programming 9

SGDClus on synthetic datasets

0 10 20 30 40 50

Iteration

Erro

r

Syn1Syn2

Syn3Syn4

000

005

010

015

020

025

030

035

SOSClus on synthetic datasets

0 10 20 30 40 50

Iteration

Syn1Syn2

Syn3Syn4

00

01

02

03

04

05

06

07

08

Erro

r

Figure 3 Convergence speed of SGDClus and SOSClus on the 4 synthetic datasets with a common learning rate that is 120578 = 1(iter + 1)SGDClus and SOSClus We find that SOSClus has a fasterconvergence speed and better robustness with respect to thelearning rate which is clearly shown in Figure 3 AlthoughSGDClus may eventually converge to a local minimum theefficiency of the optimization near a local minimum is not allroses As shown in Figure 3 the solutions of SGDClus swingaround a local minimum This phenomenon proves that theconvergence speed of SGDClus is sensitive to the choice oflearning rate 120578

Then we modify the learning rate to 120578 = 1(iter +119888) for SGDClus where 119888 is a constant optimized in theexperiments In practice 119888 = 27855 for SGDClus runningon Syn3 and 119888 = 430245 for SGDClus running on Syn4The performance comparison of SGDClus with learning rate120578 = 1(iter + 119888) and SOSClus with learning rate 120578 = 1(iter +1) on Syn3 and Syn4 is shown in Figure 4 By employingthe optimized learning rate SGDClus converges to a localminimumquicklyHowever compared to SOSClus SGDClusstill has no advantageThe hand drawing blue circles over thecurve of SOSClus in Figure 4 shows that SOSClus can escapefrom a local minimum and find the global minimum whileSGDClus just obtains the first reaching local minimum

According to (23) and (24) we accessorily obtain the solu-tions of OPT by running SOSClus So we compare the ACand NMI of OPT SGDClus and SOSClus on the 4 syntheticdatasets which are shown in Figure 5 With the increase ofobject types in the heterogeneous information networks theAC and NMI of SOSClus and OPT increase distinctly whileperformance of SGDClus almost does not changeWhen 119879 =2 AC and NMI of these three methods on Syn1 and Syn2are almost equal and low However AC and NMI of SOSClusincrease to 1 when 119879 = 4 Since the histograms of OPT SGD-Clus and SOSClus on Syn1 and Syn2 are almost the same andon Syn3 and Syn4 are also similar we know that the param-eters 119870 and 119878 have no significant effect on the performanceGenerally the larger density119863 and the number of object types119879 in the network result in higher AC and NMI of SOSClus

Obviously in the experiments on the 4 synthetic datasetsSOSClus shows an excellent performance SOSClus has afaster convergence speed and better robustness with respectto the learning rate Meanwhile SOSClus performs better onAC and NMI because it can escape from a local minimumand find the global minimum

53 Performance Comparison on Real-World Dataset

531 Real-World Dataset Description The experiments onthe real-world dataset are used to compare the performanceof the tensor CP decomposition clustering framework withother state-of-the-art methods

The real-world dataset extracted from the DBLP databaseis the DBLP-four-area dataset which can be downloadedfrom httpwebcsuclaedusimyzsundataDBLP_four_areazip It is a four research-area subset of DBLP and is usedin [2 3 12 13 15 16 18] The four research areas in DBLP-four-area dataset are database (DB) data mining (DM)machine learning (ML) and information retrieval (IR)respectively There are five representative conferences ineach area And all related authors papers published in theseconferences and terms contained in these papersrsquo titlesare included The DBLP-four-area dataset contains 14376papers with 100 labeled 14475 authors with 4057 labeled20 labeled conferences and 8920 terms The density of theDBLP-four-area dataset is 901935 times 10minus9 so we construct a4-mode tensor with size of 14376 times 14475 times 20 times 8920 and334832 nonzero elements We compare the performance oftensor CP decomposition clustering framework with severalother methods on the labeled record in this dataset

532 Comparative Methods

(i) NetClus (see [2]) An extended version of RankClus [1]which can deal with the network follows the star networkschema The time complexity of NetClus for clustering each

10 Scientific Programming

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

08

Erro

rExperiments on Syn3

SGDClusSOSClus

Escaped from a local minimum

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

Erro

r

Experiments on Syn4

SGDClusSOSClus

Escaped from a local minimum

Figure 4 Convergence speed of SGDClus and SOSClus on Syn3 and Syn4 with different learning rate The learning rate for SOSClus is120578 = 1(iter + 1) while that for SGDClus is 120578 = 1(iter + 119888) where 119888 is a constant optimized in the experiments

OPTSGDClusSOSClus

Synthetic datasets

00

02

04

06

08

10

AC

Syn1 Syn2 Syn4Syn3

OPTSGDClusSOSClus

NM

I

00

02

04

06

08

10

Synthetic datasetsSyn1 Syn2 Syn4Syn3

Figure 5 The AC and NMI of OPT SGDClus and SOSClus on the 4 synthetic datasets

object type in each iteration is 119874(119870|119864| + (1198702 + 119870)119873) where119870 is the number of clusters |119864| is the number of edges in thenetwork and119873 is the total number of objects in the network

(ii) PathSelClus (see [15 16]) A clustering method based onthe predefined metapath requires a user guide In PathSel-Clus the distance between the same type objects is measuredby PathSim [3] and themethod starts with the given seeds byuser The time complexity of PathSelClus for clustering eachobject type in each iteration is 119874((119870 + 1)|P| + 119870119873) where|P| is the number of metapath instances in the networkAnd the time complexity of PathSim used by PathSelClus forclustering each object type is 119874(119873119889) where 119889 is the averagedegree of objects

(iii) FctClus (see [13]) It is a recently proposed clusteringmethod for heterogeneous information networks As withNetClus the FctClus method can deal with networks follow-ing the star network schema The time complexity of FctClusfor clustering each object type in each iteration is 119874(119870|119864| +119879119870119873)533 Experimental Results As the baseline methods canonly deal with specific schema heterogeneous informationnetworks here we must construct different subnetworks forthem For NetClus and FctClus the network is organized as astar network schema like in [2 13] where the paper (P) is thecentre type and author (A) conference (C) and term (T) are

Scientific Programming 11

Table 2 AC of experiments on DBLP-four-area dataset

AC OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 05882 08476 09007 07154 07551 07887Author 05872 08486 09486 07177 07951 08008Conference 1 099 1 09172 09950 09031avg (AC) 05892 08493 09477 07186 07951 08010

Table 3 NMI of experiments on DBLP-four-area dataset

NMI OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 06557 06720 08812 05402 06142 07152Author 06539 08872 08822 05488 06770 06012Conference 1 08497 1 08858 09906 08248avg (NMI) 06556 08778 08827 05503 06770 06050

Table 4 Running time of experiments on DBLP-four-area dataset

Running time (s) OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper mdash mdash mdash 8026 5423 8084Author mdash mdash mdash 7437 6811 7749Conference mdash mdash mdash 6584 6293 6698Total time 6726 4324 8184 22047 18527 22531

the attribute types For PathSelClus we select the metapathof P-T-P A-P-C-P-A and C-P-T-P-C to cluster the papersauthors and conferences respectively And in PathSelCluswe give each cluster one seed to start

Wemodel theDBLP-four-area dataset as a 4-mode tensorwhere each mode represents one object type The 4 modesare author (A) paper (P) conference (C) and term (T)respectively Actually the sequence of the object types isinsignificant And each element of the tensor represents agene-network in the heterogeneous information network Inthe experiments we set the learning rate for SOSClus to be120578 = 1(iter+1) and an optimized learning rate 120578 = 1(iter+119888)with 119888 = 1000125 for SGDClus By running SOSClus weaccessorily obtain the solutions of OPT So we compare theexperimental results of OPT SGDClus and SOSClus on theDBLP-four-area dataset with the three baseline methods Seethe details in Tables 2 3 and 4

In Tables 2 and 3 SOSClus performs best on average ACand NMI and SGDClus takes the second place All methodsachieve satisfactory AC and NMI on the conference sincethere are only 20 conferences in the network SGDClus takesthe shortest running time and OPT and SOSClus have anobvious advantage on running time compared with otherbaselinesThe time complexity of SGDClus is119874(119899119899119911(X)119873119870+(119879 minus 1)1198702119873) and the time complexity of SOSClus is119874(119899119899119911(X)119873119870+(119879minus1)1198702119873+1198703) where 119899119899119911(X) is the num-ber of nonzero elements in X that is the number of gene-networks in the heterogeneous information network Wehave 119899119899z(X) lt |P| ≪ |119864| 119870 ≪ 119873 and 119879 ≪ 119873 Comparedwith the time complexity of the three baselines SGDClus andSOSClus have a little disadvantage However it is worth not-ing that the three baselines can only cluster one type of objectsin the network in each running while OPT SGDClus and

SOSClus can obtain the clusters of all types of objects simulta-neously by running onceThis is the reasonwhy only the totaltime is shown for OPT SGDClus and SOSClus in Table 4Moreover the total time of OPT SGDClus and SOSClusis on the same order of magnitude as the running time ofother baselines for clustering each type of objects which isconsistent with the comparison of the time complexity

6 Conclusion

In this paper a tensor CP decomposition method for clus-tering heterogeneous information networks is presented Intensor CP decomposition clustering framework each type ofobjects in heterogeneous information network is modeled asone mode of tensor and the gene-networks in the networkare modeled as the elements in tensor In other wordstensor CP decomposition clustering framework can modeldifferent types of objects and semantic relations in the het-erogeneous information network without the restriction ofnetwork schema In addition two stochastic gradient descentalgorithms named SGDClus and SOSClus are designedSGDClus and SOSClus can cluster all types of objects andthe gene-networks simultaneously by running once Theproposed algorithms outperformed other state-of-the-artclustering methods in terms of AC NMI and running time

Notations119886 A scalar (lowercase letter)a A vector (boldface lowercase letter)A A matrix (boldface capital letter)X A tensor (calligraphic letter)

12 Scientific Programming

1199091198941 1198942119894119873 The (1198941 1198942 119894119873)th element of an 119873th-order tensorX

X(119899) Matricization ofX along the 119899th modelowast Hadamard product (element-wise prod-uct) of two tensors (or matrices or vectors)with the same dimension⊙ Khatri-Rao product of two matrices sdot 119865 Frobenius norm of a tensor (or matrices orvectors)

a ∘ b Outer product of two vectors

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This study was supported by the National Natural ScienceFoundation of China (no 61401482 and no 61401483)

References

[1] Y Sunt J Hant P Zhao Z Yin H Cheng and T WuldquoRankClus integrating clustering with ranking for heteroge-neous information network analysisrdquo in Proceedings of the 12thInternational Conference on Extending Database TechnologyAdvances inDatabase Technology (EDBT rsquo09) pp 565ndash576 SaintPetersburg Russia March 2009

[2] Y Sun Y Yu and J Han ldquoRanking-based clustering of hetero-geneous information networks with star network schemardquo inProceedings of the 15th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo09) pp 797ndash805 Paris France July 2009

[3] Y Sun J Han X Yan P S Yu and TWu ldquoPathSim meta path-based top-K similarity search in heterogeneous informationnetworksrdquo PVLDB vol 4 no 11 pp 992ndash1003 2011

[4] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability Volume1 Statistics pp 281ndash297 University of California Press BerkeleyCalif USA 1967

[5] S Guha R Rastogi and K Shim ldquoCure an efficient clusteringalgorithm for large databasesrdquo Information Systems vol 26 no1 pp 35ndash58 2001

[6] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the 2nd International Conference onKnowledge Discovery and Data Mining (KDD-96) E SimoudisJ Han and U M Fayyad Eds pp 226ndash231 AAAI PressPortland Ore USA 1996

[7] W Wang J Yang and R R Muntz ldquoSTING a statisticalinformation grid approach to spatial data miningrdquo in Pro-ceedings of the 23rd International Conference on Very LargeData Bases (VLDB rsquo97) M Jarke M J Carey K R DittrichF H Lochovsky P Loucopoulos and M A Jeusfeld Edspp 186ndash195 Morgan Kaufmann Athens Greece August 1997httpwwwvldborgconf1997P186PDF

[8] E H Ruspini ldquoNew experimental results in fuzzy clusteringrdquoInformation Sciences vol 6 no 73 pp 273ndash284 1973

[9] U von Luxburg ldquoA tutorial on spectral clusteringrdquo Statistics andComputing vol 17 no 4 pp 395ndash416 2007

[10] C Shi Y Li J Zhang Y Sun and P S Yu ldquoA survey of hetero-geneous information network analysisrdquo IEEE Transactions onKnowledge and Data Engineering vol 29 no 1 pp 17ndash37 2017

[11] J Han Y Sun X Yan and P S Yu ldquoMining knowledge fromdata an information network analysis approachrdquo in Proceedingsof the IEEE 28th International Conference on Data Engineering(ICDE rsquo12) pp 1214ndash1217 IEEE Washington DC USA April2012

[12] J Chen W Dai Y Sun and J Dy ldquoClustering and rankingin heterogeneous information networks via gamma-poissonmodelrdquo in Proceedings of the SIAM International Conference onData Mining (SDM rsquo15) pp 424ndash432 May 2015

[13] J Yang L Chen and J Zhang ldquoFctClus a fast clustering algo-rithm for heterogeneous information networksrdquoPLoSONE vol10 no 6 Article ID e0130086 2015

[14] C Shi R Wang Y Li P S Yu and B Wu ldquoRanking-basedclustering on general heterogeneous information networks bynetwork projectionrdquo in Proceedings of the 23rd ACM Interna-tional Conference on Conference on Information and KnowledgeManagement (CIKM rsquo14) pp 699ndash708 ACM Shanghai ChinaNovember 2014

[15] Y Sun B Norick J Han X Yan P S Yu and X Yu ldquoInte-grating meta-path selection with user-guided object clusteringin heterogeneous information networksrdquo in Proceedings of the18th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo12) pp 1348ndash1356 ACMBeijing China August 2012

[16] Y Sun B Norick J Han X Yan P S Yu and X YuldquoPathSelClus integrating meta-path selection with user-guidedObject clustering in heterogeneous information networksrdquoACMTransactions onKnowledgeDiscovery fromData vol 7 no3 pp 723ndash724 2013

[17] X Yu Y Sun B Norick T Mao and J Han ldquoUser guided entitysimilarity search using meta-path selection in heterogeneousinformation networksrdquo in Proceedings of the 21st ACM Interna-tional Conference on Information and Knowledge Management(CIKM rsquo12) pp 2025ndash2029 ACM November 2012

[18] Y Sun C C Aggarwal and J Han ldquoRelation strength-awareclustering of heterogeneous information networks with incom-plete attributesrdquoProceedings of the VLDBEndowment vol 5 no5 pp 394ndash405 2012

[19] M Zhang H Hu Z He andWWang ldquoTop-k similarity searchin heterogeneous information networks with x-star networkschemardquo Expert Systems with Applications vol 42 no 2 pp699ndash712 2015

[20] Y Zhou and L Liu ldquoSocial influence based clustering ofheterogeneous information networksrdquo in Proceedings of the19th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 338ndash346 ACM Chicago IllUSA August 2013

[21] L R Tucker ldquoSome mathematical notes on three-mode factoranalysisrdquo Psychometrika vol 31 pp 279ndash311 1966

[22] R A Harshman ldquoFoundations of the parafac procedure modeland conditions for an lsquoexplanatoryrsquo multi-mode factor analysisrdquoUCLAWorking Papers in Phonetics 1969

[23] H A L Kiers ldquoTowards a standardized notation and terminol-ogy in multiway analysisrdquo Journal of Chemometrics vol 14 no3 pp 105ndash122 2000

[24] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

Scientific Programming 13

[25] A Cichocki D Mandic L De Lathauwer et al ldquoTensordecompositions for signal processing applications from two-way to multiway component analysisrdquo IEEE Signal ProcessingMagazine vol 32 no 2 pp 145ndash163 2015

[26] W Peng and T Li ldquoTensor clustering via adaptive subspaceiterationrdquo Intelligent Data Analysis vol 15 no 5 pp 695ndash7132011

[27] J Hastad ldquoTensor rank is NP-completerdquo Journal of AlgorithmsCognition Informatics and Logic vol 11 no 4 pp 644ndash6541990

[28] S Metzler and P Miettinen ldquoClustering Boolean tensorsrdquo DataMining amp Knowledge Discovery vol 29 no 5 pp 1343ndash13732015

[29] X Cao X Wei Y Han and D Lin ldquoRobust face clustering viatensor decompositionrdquo IEEE Transactions on Cybernetics vol45 no 11 pp 2546ndash2557 2015

[30] I Sutskever R Salakhutdinov and J B Tenenbaum ldquoModellingrelational data using Bayesian clustered tensor factorizationrdquoin Proceedings of the 23rd Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo09) pp 1821ndash1828 BritishColumbia Canada December 2009

[31] B Ermis E Acar and A T Cemgil ldquoLink prediction in het-erogeneous data via generalized coupled tensor factorizationrdquoDataMining amp Knowledge Discovery vol 29 no 1 pp 203ndash2362015

[32] A R Benson D F Gleiche and J Leskovec ldquoTensor spectralclustering for partitioning higher-order network structuresrdquoin Proceedings of the SIAM International Conference on DataMining (SDM rsquo15) pp 118ndash126 Vancouver Canada May 2015

[33] L Xiong X Chen T-K Huang J G Schneider and JG Carbonell ldquoTemporal collaborative filtering with Bayesianprobabilistic tensor factorizationrdquo in Proceedings of the 10thSIAM International Conference on Data Mining (SDM rsquo10) pp211ndash222 Columbus Ohio USA May 2010

[34] E E Papalexakis L Akoglu and D Ience ldquoDo more views ofa graph help Community detection and clustering in multi-graphsrdquo in Proceedings of the 16th International Conferenceof Information Fusion (FUSION rsquo13) pp 899ndash905 IstanbulTurkey July 2013

[35] M Vandecappelle M Bousse F V Eeghem and L D Lath-auwer ldquoTensor decompositions for graph clusteringrdquo Inter-nal Report 16-170 ESAT-STADIUS KU Leuven LeuvenBelgium 2016 ftpftpesatkuleuvenbepubSISTAsistakul-akreports2016_Tensor_Graph_Clusteringpdf

[36] W Shao L He and P S Yu ldquoClustering on multi-sourceincomplete data via tensor modeling and factorizationrdquo inProceedings of the 19th Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD rsquo15) pp 485ndash497 Ho ChiMinh City Vietnam 2015

[37] J D Carroll and J-J Chang ldquoAnalysis of individual differencesin multidimensional scaling via an n-way generalization ofldquoEckart-Youngrdquo decompositionrdquo Psychometrika vol 35 no 3pp 283ndash319 1970

[38] L De Lathauwer B De Moor and J Vandewalle ldquoOn thebest rank-1 and rank-(R1 R2 RN) approximation ofhigher-order tensorsrdquo SIAM Journal on Matrix Analysis andApplications vol 21 no 4 pp 1324ndash1342 2000

[39] E Acar D M Dunlavy and T G Kolda ldquoA scalable opti-mization approach for fitting canonical tensor decompositionsrdquoJournal of Chemometrics vol 25 no 2 pp 67ndash86 2011

[40] S Hansen T Plantenga and T G Kolda ldquoNewton-basedoptimization for Kullback-Leibler nonnegative tensor factor-izationsrdquo Optimization Methods amp Software vol 30 no 5 pp1002ndash1029 2015

[41] NVervliet and LDe Lathauwer ldquoA randomized block samplingapproach to canonical polyadic decomposition of large-scaletensorsrdquo IEEE Journal on SelectedTopics in Signal Processing vol10 no 2 pp 284ndash295 2016

[42] R Ge F Huang C Jin and Y Yuan ldquoEscaping from saddlepointsmdashonline stochastic gradient for tensor decompositionrdquohttparxivorgabs150302101

[43] T G Kolda ldquoMultilinear operators for higher-order decompo-sitionsrdquo Tech Rep SAND2006-2081 Sandia National Labora-tories 2006 httpwwwostigovscitechbiblio923081=0pt

[44] P Paatero ldquoA weighted non-negative least squares algorithmfor three-way lsquoPARAFACrsquo factor analysisrdquo Chemometrics ampIntelligent Laboratory Systems vol 38 no 2 pp 223ndash242 1997

[45] P Paatero ldquoConstruction and analysis of degenerate PARAFACmodelsrdquo Journal of Chemometrics vol 14 no 3 pp 285ndash2992000

[46] B L Bottou and N Murata ldquoStochastic approximations andefficient learningrdquo inTheHandbook of BrainTheory and NeuralNetworks 2nd edition 2002

[47] B W Bader and T G Kolda ldquoEfficient MATLAB computationswith sparse and factored tensorsrdquo SIAM Journal on ScientificComputing vol 30 no 1 pp 205ndash231 2007

[48] A Strehl and J Ghosh ldquoCluster ensemblesmdasha knowledgereuse framework for combining multiple partitionsrdquo Journal ofMachine Learning Research vol 3 no 3 pp 583ndash617 2003

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: A Tensor CP Decomposition Method for Clustering ...downloads.hindawi.com/journals/sp/2017/2803091.pdf · spectral clustering, are designed to analyze homogeneous informationnetworks

Scientific Programming 9

SGDClus on synthetic datasets

0 10 20 30 40 50

Iteration

Erro

r

Syn1Syn2

Syn3Syn4

000

005

010

015

020

025

030

035

SOSClus on synthetic datasets

0 10 20 30 40 50

Iteration

Syn1Syn2

Syn3Syn4

00

01

02

03

04

05

06

07

08

Erro

r

Figure 3 Convergence speed of SGDClus and SOSClus on the 4 synthetic datasets with a common learning rate that is 120578 = 1(iter + 1)SGDClus and SOSClus We find that SOSClus has a fasterconvergence speed and better robustness with respect to thelearning rate which is clearly shown in Figure 3 AlthoughSGDClus may eventually converge to a local minimum theefficiency of the optimization near a local minimum is not allroses As shown in Figure 3 the solutions of SGDClus swingaround a local minimum This phenomenon proves that theconvergence speed of SGDClus is sensitive to the choice oflearning rate 120578

Then we modify the learning rate to 120578 = 1(iter +119888) for SGDClus where 119888 is a constant optimized in theexperiments In practice 119888 = 27855 for SGDClus runningon Syn3 and 119888 = 430245 for SGDClus running on Syn4The performance comparison of SGDClus with learning rate120578 = 1(iter + 119888) and SOSClus with learning rate 120578 = 1(iter +1) on Syn3 and Syn4 is shown in Figure 4 By employingthe optimized learning rate SGDClus converges to a localminimumquicklyHowever compared to SOSClus SGDClusstill has no advantageThe hand drawing blue circles over thecurve of SOSClus in Figure 4 shows that SOSClus can escapefrom a local minimum and find the global minimum whileSGDClus just obtains the first reaching local minimum

According to (23) and (24) we accessorily obtain the solu-tions of OPT by running SOSClus So we compare the ACand NMI of OPT SGDClus and SOSClus on the 4 syntheticdatasets which are shown in Figure 5 With the increase ofobject types in the heterogeneous information networks theAC and NMI of SOSClus and OPT increase distinctly whileperformance of SGDClus almost does not changeWhen 119879 =2 AC and NMI of these three methods on Syn1 and Syn2are almost equal and low However AC and NMI of SOSClusincrease to 1 when 119879 = 4 Since the histograms of OPT SGD-Clus and SOSClus on Syn1 and Syn2 are almost the same andon Syn3 and Syn4 are also similar we know that the param-eters 119870 and 119878 have no significant effect on the performanceGenerally the larger density119863 and the number of object types119879 in the network result in higher AC and NMI of SOSClus

Obviously in the experiments on the 4 synthetic datasetsSOSClus shows an excellent performance SOSClus has afaster convergence speed and better robustness with respectto the learning rate Meanwhile SOSClus performs better onAC and NMI because it can escape from a local minimumand find the global minimum

53 Performance Comparison on Real-World Dataset

531 Real-World Dataset Description The experiments onthe real-world dataset are used to compare the performanceof the tensor CP decomposition clustering framework withother state-of-the-art methods

The real-world dataset extracted from the DBLP databaseis the DBLP-four-area dataset which can be downloadedfrom httpwebcsuclaedusimyzsundataDBLP_four_areazip It is a four research-area subset of DBLP and is usedin [2 3 12 13 15 16 18] The four research areas in DBLP-four-area dataset are database (DB) data mining (DM)machine learning (ML) and information retrieval (IR)respectively There are five representative conferences ineach area And all related authors papers published in theseconferences and terms contained in these papersrsquo titlesare included The DBLP-four-area dataset contains 14376papers with 100 labeled 14475 authors with 4057 labeled20 labeled conferences and 8920 terms The density of theDBLP-four-area dataset is 901935 times 10minus9 so we construct a4-mode tensor with size of 14376 times 14475 times 20 times 8920 and334832 nonzero elements We compare the performance oftensor CP decomposition clustering framework with severalother methods on the labeled record in this dataset

532 Comparative Methods

(i) NetClus (see [2]) An extended version of RankClus [1]which can deal with the network follows the star networkschema The time complexity of NetClus for clustering each

10 Scientific Programming

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

08

Erro

rExperiments on Syn3

SGDClusSOSClus

Escaped from a local minimum

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

Erro

r

Experiments on Syn4

SGDClusSOSClus

Escaped from a local minimum

Figure 4 Convergence speed of SGDClus and SOSClus on Syn3 and Syn4 with different learning rate The learning rate for SOSClus is120578 = 1(iter + 1) while that for SGDClus is 120578 = 1(iter + 119888) where 119888 is a constant optimized in the experiments

OPTSGDClusSOSClus

Synthetic datasets

00

02

04

06

08

10

AC

Syn1 Syn2 Syn4Syn3

OPTSGDClusSOSClus

NM

I

00

02

04

06

08

10

Synthetic datasetsSyn1 Syn2 Syn4Syn3

Figure 5 The AC and NMI of OPT SGDClus and SOSClus on the 4 synthetic datasets

object type in each iteration is 119874(119870|119864| + (1198702 + 119870)119873) where119870 is the number of clusters |119864| is the number of edges in thenetwork and119873 is the total number of objects in the network

(ii) PathSelClus (see [15 16]) A clustering method based onthe predefined metapath requires a user guide In PathSel-Clus the distance between the same type objects is measuredby PathSim [3] and themethod starts with the given seeds byuser The time complexity of PathSelClus for clustering eachobject type in each iteration is 119874((119870 + 1)|P| + 119870119873) where|P| is the number of metapath instances in the networkAnd the time complexity of PathSim used by PathSelClus forclustering each object type is 119874(119873119889) where 119889 is the averagedegree of objects

(iii) FctClus (see [13]) It is a recently proposed clusteringmethod for heterogeneous information networks As withNetClus the FctClus method can deal with networks follow-ing the star network schema The time complexity of FctClusfor clustering each object type in each iteration is 119874(119870|119864| +119879119870119873)533 Experimental Results As the baseline methods canonly deal with specific schema heterogeneous informationnetworks here we must construct different subnetworks forthem For NetClus and FctClus the network is organized as astar network schema like in [2 13] where the paper (P) is thecentre type and author (A) conference (C) and term (T) are

Scientific Programming 11

Table 2 AC of experiments on DBLP-four-area dataset

AC OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 05882 08476 09007 07154 07551 07887Author 05872 08486 09486 07177 07951 08008Conference 1 099 1 09172 09950 09031avg (AC) 05892 08493 09477 07186 07951 08010

Table 3 NMI of experiments on DBLP-four-area dataset

NMI OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 06557 06720 08812 05402 06142 07152Author 06539 08872 08822 05488 06770 06012Conference 1 08497 1 08858 09906 08248avg (NMI) 06556 08778 08827 05503 06770 06050

Table 4 Running time of experiments on DBLP-four-area dataset

Running time (s) OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper mdash mdash mdash 8026 5423 8084Author mdash mdash mdash 7437 6811 7749Conference mdash mdash mdash 6584 6293 6698Total time 6726 4324 8184 22047 18527 22531

the attribute types For PathSelClus we select the metapathof P-T-P A-P-C-P-A and C-P-T-P-C to cluster the papersauthors and conferences respectively And in PathSelCluswe give each cluster one seed to start

Wemodel theDBLP-four-area dataset as a 4-mode tensorwhere each mode represents one object type The 4 modesare author (A) paper (P) conference (C) and term (T)respectively Actually the sequence of the object types isinsignificant And each element of the tensor represents agene-network in the heterogeneous information network Inthe experiments we set the learning rate for SOSClus to be120578 = 1(iter+1) and an optimized learning rate 120578 = 1(iter+119888)with 119888 = 1000125 for SGDClus By running SOSClus weaccessorily obtain the solutions of OPT So we compare theexperimental results of OPT SGDClus and SOSClus on theDBLP-four-area dataset with the three baseline methods Seethe details in Tables 2 3 and 4

In Tables 2 and 3 SOSClus performs best on average ACand NMI and SGDClus takes the second place All methodsachieve satisfactory AC and NMI on the conference sincethere are only 20 conferences in the network SGDClus takesthe shortest running time and OPT and SOSClus have anobvious advantage on running time compared with otherbaselinesThe time complexity of SGDClus is119874(119899119899119911(X)119873119870+(119879 minus 1)1198702119873) and the time complexity of SOSClus is119874(119899119899119911(X)119873119870+(119879minus1)1198702119873+1198703) where 119899119899119911(X) is the num-ber of nonzero elements in X that is the number of gene-networks in the heterogeneous information network Wehave 119899119899z(X) lt |P| ≪ |119864| 119870 ≪ 119873 and 119879 ≪ 119873 Comparedwith the time complexity of the three baselines SGDClus andSOSClus have a little disadvantage However it is worth not-ing that the three baselines can only cluster one type of objectsin the network in each running while OPT SGDClus and

SOSClus can obtain the clusters of all types of objects simulta-neously by running onceThis is the reasonwhy only the totaltime is shown for OPT SGDClus and SOSClus in Table 4Moreover the total time of OPT SGDClus and SOSClusis on the same order of magnitude as the running time ofother baselines for clustering each type of objects which isconsistent with the comparison of the time complexity

6 Conclusion

In this paper a tensor CP decomposition method for clus-tering heterogeneous information networks is presented Intensor CP decomposition clustering framework each type ofobjects in heterogeneous information network is modeled asone mode of tensor and the gene-networks in the networkare modeled as the elements in tensor In other wordstensor CP decomposition clustering framework can modeldifferent types of objects and semantic relations in the het-erogeneous information network without the restriction ofnetwork schema In addition two stochastic gradient descentalgorithms named SGDClus and SOSClus are designedSGDClus and SOSClus can cluster all types of objects andthe gene-networks simultaneously by running once Theproposed algorithms outperformed other state-of-the-artclustering methods in terms of AC NMI and running time

Notations119886 A scalar (lowercase letter)a A vector (boldface lowercase letter)A A matrix (boldface capital letter)X A tensor (calligraphic letter)

12 Scientific Programming

1199091198941 1198942119894119873 The (1198941 1198942 119894119873)th element of an 119873th-order tensorX

X(119899) Matricization ofX along the 119899th modelowast Hadamard product (element-wise prod-uct) of two tensors (or matrices or vectors)with the same dimension⊙ Khatri-Rao product of two matrices sdot 119865 Frobenius norm of a tensor (or matrices orvectors)

a ∘ b Outer product of two vectors

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This study was supported by the National Natural ScienceFoundation of China (no 61401482 and no 61401483)

References

[1] Y Sunt J Hant P Zhao Z Yin H Cheng and T WuldquoRankClus integrating clustering with ranking for heteroge-neous information network analysisrdquo in Proceedings of the 12thInternational Conference on Extending Database TechnologyAdvances inDatabase Technology (EDBT rsquo09) pp 565ndash576 SaintPetersburg Russia March 2009

[2] Y Sun Y Yu and J Han ldquoRanking-based clustering of hetero-geneous information networks with star network schemardquo inProceedings of the 15th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo09) pp 797ndash805 Paris France July 2009

[3] Y Sun J Han X Yan P S Yu and TWu ldquoPathSim meta path-based top-K similarity search in heterogeneous informationnetworksrdquo PVLDB vol 4 no 11 pp 992ndash1003 2011

[4] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability Volume1 Statistics pp 281ndash297 University of California Press BerkeleyCalif USA 1967

[5] S Guha R Rastogi and K Shim ldquoCure an efficient clusteringalgorithm for large databasesrdquo Information Systems vol 26 no1 pp 35ndash58 2001

[6] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the 2nd International Conference onKnowledge Discovery and Data Mining (KDD-96) E SimoudisJ Han and U M Fayyad Eds pp 226ndash231 AAAI PressPortland Ore USA 1996

[7] W Wang J Yang and R R Muntz ldquoSTING a statisticalinformation grid approach to spatial data miningrdquo in Pro-ceedings of the 23rd International Conference on Very LargeData Bases (VLDB rsquo97) M Jarke M J Carey K R DittrichF H Lochovsky P Loucopoulos and M A Jeusfeld Edspp 186ndash195 Morgan Kaufmann Athens Greece August 1997httpwwwvldborgconf1997P186PDF

[8] E H Ruspini ldquoNew experimental results in fuzzy clusteringrdquoInformation Sciences vol 6 no 73 pp 273ndash284 1973

[9] U von Luxburg ldquoA tutorial on spectral clusteringrdquo Statistics andComputing vol 17 no 4 pp 395ndash416 2007

[10] C Shi Y Li J Zhang Y Sun and P S Yu ldquoA survey of hetero-geneous information network analysisrdquo IEEE Transactions onKnowledge and Data Engineering vol 29 no 1 pp 17ndash37 2017

[11] J Han Y Sun X Yan and P S Yu ldquoMining knowledge fromdata an information network analysis approachrdquo in Proceedingsof the IEEE 28th International Conference on Data Engineering(ICDE rsquo12) pp 1214ndash1217 IEEE Washington DC USA April2012

[12] J Chen W Dai Y Sun and J Dy ldquoClustering and rankingin heterogeneous information networks via gamma-poissonmodelrdquo in Proceedings of the SIAM International Conference onData Mining (SDM rsquo15) pp 424ndash432 May 2015

[13] J Yang L Chen and J Zhang ldquoFctClus a fast clustering algo-rithm for heterogeneous information networksrdquoPLoSONE vol10 no 6 Article ID e0130086 2015

[14] C Shi R Wang Y Li P S Yu and B Wu ldquoRanking-basedclustering on general heterogeneous information networks bynetwork projectionrdquo in Proceedings of the 23rd ACM Interna-tional Conference on Conference on Information and KnowledgeManagement (CIKM rsquo14) pp 699ndash708 ACM Shanghai ChinaNovember 2014

[15] Y Sun B Norick J Han X Yan P S Yu and X Yu ldquoInte-grating meta-path selection with user-guided object clusteringin heterogeneous information networksrdquo in Proceedings of the18th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo12) pp 1348ndash1356 ACMBeijing China August 2012

[16] Y Sun B Norick J Han X Yan P S Yu and X YuldquoPathSelClus integrating meta-path selection with user-guidedObject clustering in heterogeneous information networksrdquoACMTransactions onKnowledgeDiscovery fromData vol 7 no3 pp 723ndash724 2013

[17] X Yu Y Sun B Norick T Mao and J Han ldquoUser guided entitysimilarity search using meta-path selection in heterogeneousinformation networksrdquo in Proceedings of the 21st ACM Interna-tional Conference on Information and Knowledge Management(CIKM rsquo12) pp 2025ndash2029 ACM November 2012

[18] Y Sun C C Aggarwal and J Han ldquoRelation strength-awareclustering of heterogeneous information networks with incom-plete attributesrdquoProceedings of the VLDBEndowment vol 5 no5 pp 394ndash405 2012

[19] M Zhang H Hu Z He andWWang ldquoTop-k similarity searchin heterogeneous information networks with x-star networkschemardquo Expert Systems with Applications vol 42 no 2 pp699ndash712 2015

[20] Y Zhou and L Liu ldquoSocial influence based clustering ofheterogeneous information networksrdquo in Proceedings of the19th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 338ndash346 ACM Chicago IllUSA August 2013

[21] L R Tucker ldquoSome mathematical notes on three-mode factoranalysisrdquo Psychometrika vol 31 pp 279ndash311 1966

[22] R A Harshman ldquoFoundations of the parafac procedure modeland conditions for an lsquoexplanatoryrsquo multi-mode factor analysisrdquoUCLAWorking Papers in Phonetics 1969

[23] H A L Kiers ldquoTowards a standardized notation and terminol-ogy in multiway analysisrdquo Journal of Chemometrics vol 14 no3 pp 105ndash122 2000

[24] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

Scientific Programming 13

[25] A Cichocki D Mandic L De Lathauwer et al ldquoTensordecompositions for signal processing applications from two-way to multiway component analysisrdquo IEEE Signal ProcessingMagazine vol 32 no 2 pp 145ndash163 2015

[26] W Peng and T Li ldquoTensor clustering via adaptive subspaceiterationrdquo Intelligent Data Analysis vol 15 no 5 pp 695ndash7132011

[27] J Hastad ldquoTensor rank is NP-completerdquo Journal of AlgorithmsCognition Informatics and Logic vol 11 no 4 pp 644ndash6541990

[28] S Metzler and P Miettinen ldquoClustering Boolean tensorsrdquo DataMining amp Knowledge Discovery vol 29 no 5 pp 1343ndash13732015

[29] X Cao X Wei Y Han and D Lin ldquoRobust face clustering viatensor decompositionrdquo IEEE Transactions on Cybernetics vol45 no 11 pp 2546ndash2557 2015

[30] I Sutskever R Salakhutdinov and J B Tenenbaum ldquoModellingrelational data using Bayesian clustered tensor factorizationrdquoin Proceedings of the 23rd Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo09) pp 1821ndash1828 BritishColumbia Canada December 2009

[31] B Ermis E Acar and A T Cemgil ldquoLink prediction in het-erogeneous data via generalized coupled tensor factorizationrdquoDataMining amp Knowledge Discovery vol 29 no 1 pp 203ndash2362015

[32] A R Benson D F Gleiche and J Leskovec ldquoTensor spectralclustering for partitioning higher-order network structuresrdquoin Proceedings of the SIAM International Conference on DataMining (SDM rsquo15) pp 118ndash126 Vancouver Canada May 2015

[33] L Xiong X Chen T-K Huang J G Schneider and JG Carbonell ldquoTemporal collaborative filtering with Bayesianprobabilistic tensor factorizationrdquo in Proceedings of the 10thSIAM International Conference on Data Mining (SDM rsquo10) pp211ndash222 Columbus Ohio USA May 2010

[34] E E Papalexakis L Akoglu and D Ience ldquoDo more views ofa graph help Community detection and clustering in multi-graphsrdquo in Proceedings of the 16th International Conferenceof Information Fusion (FUSION rsquo13) pp 899ndash905 IstanbulTurkey July 2013

[35] M Vandecappelle M Bousse F V Eeghem and L D Lath-auwer ldquoTensor decompositions for graph clusteringrdquo Inter-nal Report 16-170 ESAT-STADIUS KU Leuven LeuvenBelgium 2016 ftpftpesatkuleuvenbepubSISTAsistakul-akreports2016_Tensor_Graph_Clusteringpdf

[36] W Shao L He and P S Yu ldquoClustering on multi-sourceincomplete data via tensor modeling and factorizationrdquo inProceedings of the 19th Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD rsquo15) pp 485ndash497 Ho ChiMinh City Vietnam 2015

[37] J D Carroll and J-J Chang ldquoAnalysis of individual differencesin multidimensional scaling via an n-way generalization ofldquoEckart-Youngrdquo decompositionrdquo Psychometrika vol 35 no 3pp 283ndash319 1970

[38] L De Lathauwer B De Moor and J Vandewalle ldquoOn thebest rank-1 and rank-(R1 R2 RN) approximation ofhigher-order tensorsrdquo SIAM Journal on Matrix Analysis andApplications vol 21 no 4 pp 1324ndash1342 2000

[39] E Acar D M Dunlavy and T G Kolda ldquoA scalable opti-mization approach for fitting canonical tensor decompositionsrdquoJournal of Chemometrics vol 25 no 2 pp 67ndash86 2011

[40] S Hansen T Plantenga and T G Kolda ldquoNewton-basedoptimization for Kullback-Leibler nonnegative tensor factor-izationsrdquo Optimization Methods amp Software vol 30 no 5 pp1002ndash1029 2015

[41] NVervliet and LDe Lathauwer ldquoA randomized block samplingapproach to canonical polyadic decomposition of large-scaletensorsrdquo IEEE Journal on SelectedTopics in Signal Processing vol10 no 2 pp 284ndash295 2016

[42] R Ge F Huang C Jin and Y Yuan ldquoEscaping from saddlepointsmdashonline stochastic gradient for tensor decompositionrdquohttparxivorgabs150302101

[43] T G Kolda ldquoMultilinear operators for higher-order decompo-sitionsrdquo Tech Rep SAND2006-2081 Sandia National Labora-tories 2006 httpwwwostigovscitechbiblio923081=0pt

[44] P Paatero ldquoA weighted non-negative least squares algorithmfor three-way lsquoPARAFACrsquo factor analysisrdquo Chemometrics ampIntelligent Laboratory Systems vol 38 no 2 pp 223ndash242 1997

[45] P Paatero ldquoConstruction and analysis of degenerate PARAFACmodelsrdquo Journal of Chemometrics vol 14 no 3 pp 285ndash2992000

[46] B L Bottou and N Murata ldquoStochastic approximations andefficient learningrdquo inTheHandbook of BrainTheory and NeuralNetworks 2nd edition 2002

[47] B W Bader and T G Kolda ldquoEfficient MATLAB computationswith sparse and factored tensorsrdquo SIAM Journal on ScientificComputing vol 30 no 1 pp 205ndash231 2007

[48] A Strehl and J Ghosh ldquoCluster ensemblesmdasha knowledgereuse framework for combining multiple partitionsrdquo Journal ofMachine Learning Research vol 3 no 3 pp 583ndash617 2003

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: A Tensor CP Decomposition Method for Clustering ...downloads.hindawi.com/journals/sp/2017/2803091.pdf · spectral clustering, are designed to analyze homogeneous informationnetworks

10 Scientific Programming

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

08

Erro

rExperiments on Syn3

SGDClusSOSClus

Escaped from a local minimum

0 10 20 30 40 50

Iteration

00

01

02

03

04

05

06

07

Erro

r

Experiments on Syn4

SGDClusSOSClus

Escaped from a local minimum

Figure 4 Convergence speed of SGDClus and SOSClus on Syn3 and Syn4 with different learning rate The learning rate for SOSClus is120578 = 1(iter + 1) while that for SGDClus is 120578 = 1(iter + 119888) where 119888 is a constant optimized in the experiments

OPTSGDClusSOSClus

Synthetic datasets

00

02

04

06

08

10

AC

Syn1 Syn2 Syn4Syn3

OPTSGDClusSOSClus

NM

I

00

02

04

06

08

10

Synthetic datasetsSyn1 Syn2 Syn4Syn3

Figure 5 The AC and NMI of OPT SGDClus and SOSClus on the 4 synthetic datasets

object type in each iteration is 119874(119870|119864| + (1198702 + 119870)119873) where119870 is the number of clusters |119864| is the number of edges in thenetwork and119873 is the total number of objects in the network

(ii) PathSelClus (see [15 16]) A clustering method based onthe predefined metapath requires a user guide In PathSel-Clus the distance between the same type objects is measuredby PathSim [3] and themethod starts with the given seeds byuser The time complexity of PathSelClus for clustering eachobject type in each iteration is 119874((119870 + 1)|P| + 119870119873) where|P| is the number of metapath instances in the networkAnd the time complexity of PathSim used by PathSelClus forclustering each object type is 119874(119873119889) where 119889 is the averagedegree of objects

(iii) FctClus (see [13]) It is a recently proposed clusteringmethod for heterogeneous information networks As withNetClus the FctClus method can deal with networks follow-ing the star network schema The time complexity of FctClusfor clustering each object type in each iteration is 119874(119870|119864| +119879119870119873)533 Experimental Results As the baseline methods canonly deal with specific schema heterogeneous informationnetworks here we must construct different subnetworks forthem For NetClus and FctClus the network is organized as astar network schema like in [2 13] where the paper (P) is thecentre type and author (A) conference (C) and term (T) are

Scientific Programming 11

Table 2 AC of experiments on DBLP-four-area dataset

AC OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 05882 08476 09007 07154 07551 07887Author 05872 08486 09486 07177 07951 08008Conference 1 099 1 09172 09950 09031avg (AC) 05892 08493 09477 07186 07951 08010

Table 3 NMI of experiments on DBLP-four-area dataset

NMI OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 06557 06720 08812 05402 06142 07152Author 06539 08872 08822 05488 06770 06012Conference 1 08497 1 08858 09906 08248avg (NMI) 06556 08778 08827 05503 06770 06050

Table 4 Running time of experiments on DBLP-four-area dataset

Running time (s) OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper mdash mdash mdash 8026 5423 8084Author mdash mdash mdash 7437 6811 7749Conference mdash mdash mdash 6584 6293 6698Total time 6726 4324 8184 22047 18527 22531

the attribute types For PathSelClus we select the metapathof P-T-P A-P-C-P-A and C-P-T-P-C to cluster the papersauthors and conferences respectively And in PathSelCluswe give each cluster one seed to start

Wemodel theDBLP-four-area dataset as a 4-mode tensorwhere each mode represents one object type The 4 modesare author (A) paper (P) conference (C) and term (T)respectively Actually the sequence of the object types isinsignificant And each element of the tensor represents agene-network in the heterogeneous information network Inthe experiments we set the learning rate for SOSClus to be120578 = 1(iter+1) and an optimized learning rate 120578 = 1(iter+119888)with 119888 = 1000125 for SGDClus By running SOSClus weaccessorily obtain the solutions of OPT So we compare theexperimental results of OPT SGDClus and SOSClus on theDBLP-four-area dataset with the three baseline methods Seethe details in Tables 2 3 and 4

In Tables 2 and 3 SOSClus performs best on average ACand NMI and SGDClus takes the second place All methodsachieve satisfactory AC and NMI on the conference sincethere are only 20 conferences in the network SGDClus takesthe shortest running time and OPT and SOSClus have anobvious advantage on running time compared with otherbaselinesThe time complexity of SGDClus is119874(119899119899119911(X)119873119870+(119879 minus 1)1198702119873) and the time complexity of SOSClus is119874(119899119899119911(X)119873119870+(119879minus1)1198702119873+1198703) where 119899119899119911(X) is the num-ber of nonzero elements in X that is the number of gene-networks in the heterogeneous information network Wehave 119899119899z(X) lt |P| ≪ |119864| 119870 ≪ 119873 and 119879 ≪ 119873 Comparedwith the time complexity of the three baselines SGDClus andSOSClus have a little disadvantage However it is worth not-ing that the three baselines can only cluster one type of objectsin the network in each running while OPT SGDClus and

SOSClus can obtain the clusters of all types of objects simulta-neously by running onceThis is the reasonwhy only the totaltime is shown for OPT SGDClus and SOSClus in Table 4Moreover the total time of OPT SGDClus and SOSClusis on the same order of magnitude as the running time ofother baselines for clustering each type of objects which isconsistent with the comparison of the time complexity

6 Conclusion

In this paper a tensor CP decomposition method for clus-tering heterogeneous information networks is presented Intensor CP decomposition clustering framework each type ofobjects in heterogeneous information network is modeled asone mode of tensor and the gene-networks in the networkare modeled as the elements in tensor In other wordstensor CP decomposition clustering framework can modeldifferent types of objects and semantic relations in the het-erogeneous information network without the restriction ofnetwork schema In addition two stochastic gradient descentalgorithms named SGDClus and SOSClus are designedSGDClus and SOSClus can cluster all types of objects andthe gene-networks simultaneously by running once Theproposed algorithms outperformed other state-of-the-artclustering methods in terms of AC NMI and running time

Notations119886 A scalar (lowercase letter)a A vector (boldface lowercase letter)A A matrix (boldface capital letter)X A tensor (calligraphic letter)

12 Scientific Programming

1199091198941 1198942119894119873 The (1198941 1198942 119894119873)th element of an 119873th-order tensorX

X(119899) Matricization ofX along the 119899th modelowast Hadamard product (element-wise prod-uct) of two tensors (or matrices or vectors)with the same dimension⊙ Khatri-Rao product of two matrices sdot 119865 Frobenius norm of a tensor (or matrices orvectors)

a ∘ b Outer product of two vectors

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This study was supported by the National Natural ScienceFoundation of China (no 61401482 and no 61401483)

References

[1] Y Sunt J Hant P Zhao Z Yin H Cheng and T WuldquoRankClus integrating clustering with ranking for heteroge-neous information network analysisrdquo in Proceedings of the 12thInternational Conference on Extending Database TechnologyAdvances inDatabase Technology (EDBT rsquo09) pp 565ndash576 SaintPetersburg Russia March 2009

[2] Y Sun Y Yu and J Han ldquoRanking-based clustering of hetero-geneous information networks with star network schemardquo inProceedings of the 15th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo09) pp 797ndash805 Paris France July 2009

[3] Y Sun J Han X Yan P S Yu and TWu ldquoPathSim meta path-based top-K similarity search in heterogeneous informationnetworksrdquo PVLDB vol 4 no 11 pp 992ndash1003 2011

[4] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability Volume1 Statistics pp 281ndash297 University of California Press BerkeleyCalif USA 1967

[5] S Guha R Rastogi and K Shim ldquoCure an efficient clusteringalgorithm for large databasesrdquo Information Systems vol 26 no1 pp 35ndash58 2001

[6] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the 2nd International Conference onKnowledge Discovery and Data Mining (KDD-96) E SimoudisJ Han and U M Fayyad Eds pp 226ndash231 AAAI PressPortland Ore USA 1996

[7] W Wang J Yang and R R Muntz ldquoSTING a statisticalinformation grid approach to spatial data miningrdquo in Pro-ceedings of the 23rd International Conference on Very LargeData Bases (VLDB rsquo97) M Jarke M J Carey K R DittrichF H Lochovsky P Loucopoulos and M A Jeusfeld Edspp 186ndash195 Morgan Kaufmann Athens Greece August 1997httpwwwvldborgconf1997P186PDF

[8] E H Ruspini ldquoNew experimental results in fuzzy clusteringrdquoInformation Sciences vol 6 no 73 pp 273ndash284 1973

[9] U von Luxburg ldquoA tutorial on spectral clusteringrdquo Statistics andComputing vol 17 no 4 pp 395ndash416 2007

[10] C Shi Y Li J Zhang Y Sun and P S Yu ldquoA survey of hetero-geneous information network analysisrdquo IEEE Transactions onKnowledge and Data Engineering vol 29 no 1 pp 17ndash37 2017

[11] J Han Y Sun X Yan and P S Yu ldquoMining knowledge fromdata an information network analysis approachrdquo in Proceedingsof the IEEE 28th International Conference on Data Engineering(ICDE rsquo12) pp 1214ndash1217 IEEE Washington DC USA April2012

[12] J Chen W Dai Y Sun and J Dy ldquoClustering and rankingin heterogeneous information networks via gamma-poissonmodelrdquo in Proceedings of the SIAM International Conference onData Mining (SDM rsquo15) pp 424ndash432 May 2015

[13] J Yang L Chen and J Zhang ldquoFctClus a fast clustering algo-rithm for heterogeneous information networksrdquoPLoSONE vol10 no 6 Article ID e0130086 2015

[14] C Shi R Wang Y Li P S Yu and B Wu ldquoRanking-basedclustering on general heterogeneous information networks bynetwork projectionrdquo in Proceedings of the 23rd ACM Interna-tional Conference on Conference on Information and KnowledgeManagement (CIKM rsquo14) pp 699ndash708 ACM Shanghai ChinaNovember 2014

[15] Y Sun B Norick J Han X Yan P S Yu and X Yu ldquoInte-grating meta-path selection with user-guided object clusteringin heterogeneous information networksrdquo in Proceedings of the18th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo12) pp 1348ndash1356 ACMBeijing China August 2012

[16] Y Sun B Norick J Han X Yan P S Yu and X YuldquoPathSelClus integrating meta-path selection with user-guidedObject clustering in heterogeneous information networksrdquoACMTransactions onKnowledgeDiscovery fromData vol 7 no3 pp 723ndash724 2013

[17] X Yu Y Sun B Norick T Mao and J Han ldquoUser guided entitysimilarity search using meta-path selection in heterogeneousinformation networksrdquo in Proceedings of the 21st ACM Interna-tional Conference on Information and Knowledge Management(CIKM rsquo12) pp 2025ndash2029 ACM November 2012

[18] Y Sun C C Aggarwal and J Han ldquoRelation strength-awareclustering of heterogeneous information networks with incom-plete attributesrdquoProceedings of the VLDBEndowment vol 5 no5 pp 394ndash405 2012

[19] M Zhang H Hu Z He andWWang ldquoTop-k similarity searchin heterogeneous information networks with x-star networkschemardquo Expert Systems with Applications vol 42 no 2 pp699ndash712 2015

[20] Y Zhou and L Liu ldquoSocial influence based clustering ofheterogeneous information networksrdquo in Proceedings of the19th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 338ndash346 ACM Chicago IllUSA August 2013

[21] L R Tucker ldquoSome mathematical notes on three-mode factoranalysisrdquo Psychometrika vol 31 pp 279ndash311 1966

[22] R A Harshman ldquoFoundations of the parafac procedure modeland conditions for an lsquoexplanatoryrsquo multi-mode factor analysisrdquoUCLAWorking Papers in Phonetics 1969

[23] H A L Kiers ldquoTowards a standardized notation and terminol-ogy in multiway analysisrdquo Journal of Chemometrics vol 14 no3 pp 105ndash122 2000

[24] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

Scientific Programming 13

[25] A Cichocki D Mandic L De Lathauwer et al ldquoTensordecompositions for signal processing applications from two-way to multiway component analysisrdquo IEEE Signal ProcessingMagazine vol 32 no 2 pp 145ndash163 2015

[26] W Peng and T Li ldquoTensor clustering via adaptive subspaceiterationrdquo Intelligent Data Analysis vol 15 no 5 pp 695ndash7132011

[27] J Hastad ldquoTensor rank is NP-completerdquo Journal of AlgorithmsCognition Informatics and Logic vol 11 no 4 pp 644ndash6541990

[28] S Metzler and P Miettinen ldquoClustering Boolean tensorsrdquo DataMining amp Knowledge Discovery vol 29 no 5 pp 1343ndash13732015

[29] X Cao X Wei Y Han and D Lin ldquoRobust face clustering viatensor decompositionrdquo IEEE Transactions on Cybernetics vol45 no 11 pp 2546ndash2557 2015

[30] I Sutskever R Salakhutdinov and J B Tenenbaum ldquoModellingrelational data using Bayesian clustered tensor factorizationrdquoin Proceedings of the 23rd Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo09) pp 1821ndash1828 BritishColumbia Canada December 2009

[31] B Ermis E Acar and A T Cemgil ldquoLink prediction in het-erogeneous data via generalized coupled tensor factorizationrdquoDataMining amp Knowledge Discovery vol 29 no 1 pp 203ndash2362015

[32] A R Benson D F Gleiche and J Leskovec ldquoTensor spectralclustering for partitioning higher-order network structuresrdquoin Proceedings of the SIAM International Conference on DataMining (SDM rsquo15) pp 118ndash126 Vancouver Canada May 2015

[33] L Xiong X Chen T-K Huang J G Schneider and JG Carbonell ldquoTemporal collaborative filtering with Bayesianprobabilistic tensor factorizationrdquo in Proceedings of the 10thSIAM International Conference on Data Mining (SDM rsquo10) pp211ndash222 Columbus Ohio USA May 2010

[34] E E Papalexakis L Akoglu and D Ience ldquoDo more views ofa graph help Community detection and clustering in multi-graphsrdquo in Proceedings of the 16th International Conferenceof Information Fusion (FUSION rsquo13) pp 899ndash905 IstanbulTurkey July 2013

[35] M Vandecappelle M Bousse F V Eeghem and L D Lath-auwer ldquoTensor decompositions for graph clusteringrdquo Inter-nal Report 16-170 ESAT-STADIUS KU Leuven LeuvenBelgium 2016 ftpftpesatkuleuvenbepubSISTAsistakul-akreports2016_Tensor_Graph_Clusteringpdf

[36] W Shao L He and P S Yu ldquoClustering on multi-sourceincomplete data via tensor modeling and factorizationrdquo inProceedings of the 19th Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD rsquo15) pp 485ndash497 Ho ChiMinh City Vietnam 2015

[37] J D Carroll and J-J Chang ldquoAnalysis of individual differencesin multidimensional scaling via an n-way generalization ofldquoEckart-Youngrdquo decompositionrdquo Psychometrika vol 35 no 3pp 283ndash319 1970

[38] L De Lathauwer B De Moor and J Vandewalle ldquoOn thebest rank-1 and rank-(R1 R2 RN) approximation ofhigher-order tensorsrdquo SIAM Journal on Matrix Analysis andApplications vol 21 no 4 pp 1324ndash1342 2000

[39] E Acar D M Dunlavy and T G Kolda ldquoA scalable opti-mization approach for fitting canonical tensor decompositionsrdquoJournal of Chemometrics vol 25 no 2 pp 67ndash86 2011

[40] S Hansen T Plantenga and T G Kolda ldquoNewton-basedoptimization for Kullback-Leibler nonnegative tensor factor-izationsrdquo Optimization Methods amp Software vol 30 no 5 pp1002ndash1029 2015

[41] NVervliet and LDe Lathauwer ldquoA randomized block samplingapproach to canonical polyadic decomposition of large-scaletensorsrdquo IEEE Journal on SelectedTopics in Signal Processing vol10 no 2 pp 284ndash295 2016

[42] R Ge F Huang C Jin and Y Yuan ldquoEscaping from saddlepointsmdashonline stochastic gradient for tensor decompositionrdquohttparxivorgabs150302101

[43] T G Kolda ldquoMultilinear operators for higher-order decompo-sitionsrdquo Tech Rep SAND2006-2081 Sandia National Labora-tories 2006 httpwwwostigovscitechbiblio923081=0pt

[44] P Paatero ldquoA weighted non-negative least squares algorithmfor three-way lsquoPARAFACrsquo factor analysisrdquo Chemometrics ampIntelligent Laboratory Systems vol 38 no 2 pp 223ndash242 1997

[45] P Paatero ldquoConstruction and analysis of degenerate PARAFACmodelsrdquo Journal of Chemometrics vol 14 no 3 pp 285ndash2992000

[46] B L Bottou and N Murata ldquoStochastic approximations andefficient learningrdquo inTheHandbook of BrainTheory and NeuralNetworks 2nd edition 2002

[47] B W Bader and T G Kolda ldquoEfficient MATLAB computationswith sparse and factored tensorsrdquo SIAM Journal on ScientificComputing vol 30 no 1 pp 205ndash231 2007

[48] A Strehl and J Ghosh ldquoCluster ensemblesmdasha knowledgereuse framework for combining multiple partitionsrdquo Journal ofMachine Learning Research vol 3 no 3 pp 583ndash617 2003

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: A Tensor CP Decomposition Method for Clustering ...downloads.hindawi.com/journals/sp/2017/2803091.pdf · spectral clustering, are designed to analyze homogeneous informationnetworks

Scientific Programming 11

Table 2 AC of experiments on DBLP-four-area dataset

AC OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 05882 08476 09007 07154 07551 07887Author 05872 08486 09486 07177 07951 08008Conference 1 099 1 09172 09950 09031avg (AC) 05892 08493 09477 07186 07951 08010

Table 3 NMI of experiments on DBLP-four-area dataset

NMI OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper 06557 06720 08812 05402 06142 07152Author 06539 08872 08822 05488 06770 06012Conference 1 08497 1 08858 09906 08248avg (NMI) 06556 08778 08827 05503 06770 06050

Table 4 Running time of experiments on DBLP-four-area dataset

Running time (s) OPT SGDClus SOSClus NetClus PathSelClus FctClusPaper mdash mdash mdash 8026 5423 8084Author mdash mdash mdash 7437 6811 7749Conference mdash mdash mdash 6584 6293 6698Total time 6726 4324 8184 22047 18527 22531

the attribute types For PathSelClus we select the metapathof P-T-P A-P-C-P-A and C-P-T-P-C to cluster the papersauthors and conferences respectively And in PathSelCluswe give each cluster one seed to start

Wemodel theDBLP-four-area dataset as a 4-mode tensorwhere each mode represents one object type The 4 modesare author (A) paper (P) conference (C) and term (T)respectively Actually the sequence of the object types isinsignificant And each element of the tensor represents agene-network in the heterogeneous information network Inthe experiments we set the learning rate for SOSClus to be120578 = 1(iter+1) and an optimized learning rate 120578 = 1(iter+119888)with 119888 = 1000125 for SGDClus By running SOSClus weaccessorily obtain the solutions of OPT So we compare theexperimental results of OPT SGDClus and SOSClus on theDBLP-four-area dataset with the three baseline methods Seethe details in Tables 2 3 and 4

In Tables 2 and 3 SOSClus performs best on average ACand NMI and SGDClus takes the second place All methodsachieve satisfactory AC and NMI on the conference sincethere are only 20 conferences in the network SGDClus takesthe shortest running time and OPT and SOSClus have anobvious advantage on running time compared with otherbaselinesThe time complexity of SGDClus is119874(119899119899119911(X)119873119870+(119879 minus 1)1198702119873) and the time complexity of SOSClus is119874(119899119899119911(X)119873119870+(119879minus1)1198702119873+1198703) where 119899119899119911(X) is the num-ber of nonzero elements in X that is the number of gene-networks in the heterogeneous information network Wehave 119899119899z(X) lt |P| ≪ |119864| 119870 ≪ 119873 and 119879 ≪ 119873 Comparedwith the time complexity of the three baselines SGDClus andSOSClus have a little disadvantage However it is worth not-ing that the three baselines can only cluster one type of objectsin the network in each running while OPT SGDClus and

SOSClus can obtain the clusters of all types of objects simulta-neously by running onceThis is the reasonwhy only the totaltime is shown for OPT SGDClus and SOSClus in Table 4Moreover the total time of OPT SGDClus and SOSClusis on the same order of magnitude as the running time ofother baselines for clustering each type of objects which isconsistent with the comparison of the time complexity

6 Conclusion

In this paper a tensor CP decomposition method for clus-tering heterogeneous information networks is presented Intensor CP decomposition clustering framework each type ofobjects in heterogeneous information network is modeled asone mode of tensor and the gene-networks in the networkare modeled as the elements in tensor In other wordstensor CP decomposition clustering framework can modeldifferent types of objects and semantic relations in the het-erogeneous information network without the restriction ofnetwork schema In addition two stochastic gradient descentalgorithms named SGDClus and SOSClus are designedSGDClus and SOSClus can cluster all types of objects andthe gene-networks simultaneously by running once Theproposed algorithms outperformed other state-of-the-artclustering methods in terms of AC NMI and running time

Notations119886 A scalar (lowercase letter)a A vector (boldface lowercase letter)A A matrix (boldface capital letter)X A tensor (calligraphic letter)

12 Scientific Programming

1199091198941 1198942119894119873 The (1198941 1198942 119894119873)th element of an 119873th-order tensorX

X(119899) Matricization ofX along the 119899th modelowast Hadamard product (element-wise prod-uct) of two tensors (or matrices or vectors)with the same dimension⊙ Khatri-Rao product of two matrices sdot 119865 Frobenius norm of a tensor (or matrices orvectors)

a ∘ b Outer product of two vectors

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This study was supported by the National Natural ScienceFoundation of China (no 61401482 and no 61401483)

References

[1] Y Sunt J Hant P Zhao Z Yin H Cheng and T WuldquoRankClus integrating clustering with ranking for heteroge-neous information network analysisrdquo in Proceedings of the 12thInternational Conference on Extending Database TechnologyAdvances inDatabase Technology (EDBT rsquo09) pp 565ndash576 SaintPetersburg Russia March 2009

[2] Y Sun Y Yu and J Han ldquoRanking-based clustering of hetero-geneous information networks with star network schemardquo inProceedings of the 15th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo09) pp 797ndash805 Paris France July 2009

[3] Y Sun J Han X Yan P S Yu and TWu ldquoPathSim meta path-based top-K similarity search in heterogeneous informationnetworksrdquo PVLDB vol 4 no 11 pp 992ndash1003 2011

[4] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability Volume1 Statistics pp 281ndash297 University of California Press BerkeleyCalif USA 1967

[5] S Guha R Rastogi and K Shim ldquoCure an efficient clusteringalgorithm for large databasesrdquo Information Systems vol 26 no1 pp 35ndash58 2001

[6] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the 2nd International Conference onKnowledge Discovery and Data Mining (KDD-96) E SimoudisJ Han and U M Fayyad Eds pp 226ndash231 AAAI PressPortland Ore USA 1996

[7] W Wang J Yang and R R Muntz ldquoSTING a statisticalinformation grid approach to spatial data miningrdquo in Pro-ceedings of the 23rd International Conference on Very LargeData Bases (VLDB rsquo97) M Jarke M J Carey K R DittrichF H Lochovsky P Loucopoulos and M A Jeusfeld Edspp 186ndash195 Morgan Kaufmann Athens Greece August 1997httpwwwvldborgconf1997P186PDF

[8] E H Ruspini ldquoNew experimental results in fuzzy clusteringrdquoInformation Sciences vol 6 no 73 pp 273ndash284 1973

[9] U von Luxburg ldquoA tutorial on spectral clusteringrdquo Statistics andComputing vol 17 no 4 pp 395ndash416 2007

[10] C Shi Y Li J Zhang Y Sun and P S Yu ldquoA survey of hetero-geneous information network analysisrdquo IEEE Transactions onKnowledge and Data Engineering vol 29 no 1 pp 17ndash37 2017

[11] J Han Y Sun X Yan and P S Yu ldquoMining knowledge fromdata an information network analysis approachrdquo in Proceedingsof the IEEE 28th International Conference on Data Engineering(ICDE rsquo12) pp 1214ndash1217 IEEE Washington DC USA April2012

[12] J Chen W Dai Y Sun and J Dy ldquoClustering and rankingin heterogeneous information networks via gamma-poissonmodelrdquo in Proceedings of the SIAM International Conference onData Mining (SDM rsquo15) pp 424ndash432 May 2015

[13] J Yang L Chen and J Zhang ldquoFctClus a fast clustering algo-rithm for heterogeneous information networksrdquoPLoSONE vol10 no 6 Article ID e0130086 2015

[14] C Shi R Wang Y Li P S Yu and B Wu ldquoRanking-basedclustering on general heterogeneous information networks bynetwork projectionrdquo in Proceedings of the 23rd ACM Interna-tional Conference on Conference on Information and KnowledgeManagement (CIKM rsquo14) pp 699ndash708 ACM Shanghai ChinaNovember 2014

[15] Y Sun B Norick J Han X Yan P S Yu and X Yu ldquoInte-grating meta-path selection with user-guided object clusteringin heterogeneous information networksrdquo in Proceedings of the18th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo12) pp 1348ndash1356 ACMBeijing China August 2012

[16] Y Sun B Norick J Han X Yan P S Yu and X YuldquoPathSelClus integrating meta-path selection with user-guidedObject clustering in heterogeneous information networksrdquoACMTransactions onKnowledgeDiscovery fromData vol 7 no3 pp 723ndash724 2013

[17] X Yu Y Sun B Norick T Mao and J Han ldquoUser guided entitysimilarity search using meta-path selection in heterogeneousinformation networksrdquo in Proceedings of the 21st ACM Interna-tional Conference on Information and Knowledge Management(CIKM rsquo12) pp 2025ndash2029 ACM November 2012

[18] Y Sun C C Aggarwal and J Han ldquoRelation strength-awareclustering of heterogeneous information networks with incom-plete attributesrdquoProceedings of the VLDBEndowment vol 5 no5 pp 394ndash405 2012

[19] M Zhang H Hu Z He andWWang ldquoTop-k similarity searchin heterogeneous information networks with x-star networkschemardquo Expert Systems with Applications vol 42 no 2 pp699ndash712 2015

[20] Y Zhou and L Liu ldquoSocial influence based clustering ofheterogeneous information networksrdquo in Proceedings of the19th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 338ndash346 ACM Chicago IllUSA August 2013

[21] L R Tucker ldquoSome mathematical notes on three-mode factoranalysisrdquo Psychometrika vol 31 pp 279ndash311 1966

[22] R A Harshman ldquoFoundations of the parafac procedure modeland conditions for an lsquoexplanatoryrsquo multi-mode factor analysisrdquoUCLAWorking Papers in Phonetics 1969

[23] H A L Kiers ldquoTowards a standardized notation and terminol-ogy in multiway analysisrdquo Journal of Chemometrics vol 14 no3 pp 105ndash122 2000

[24] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

Scientific Programming 13

[25] A Cichocki D Mandic L De Lathauwer et al ldquoTensordecompositions for signal processing applications from two-way to multiway component analysisrdquo IEEE Signal ProcessingMagazine vol 32 no 2 pp 145ndash163 2015

[26] W Peng and T Li ldquoTensor clustering via adaptive subspaceiterationrdquo Intelligent Data Analysis vol 15 no 5 pp 695ndash7132011

[27] J Hastad ldquoTensor rank is NP-completerdquo Journal of AlgorithmsCognition Informatics and Logic vol 11 no 4 pp 644ndash6541990

[28] S Metzler and P Miettinen ldquoClustering Boolean tensorsrdquo DataMining amp Knowledge Discovery vol 29 no 5 pp 1343ndash13732015

[29] X Cao X Wei Y Han and D Lin ldquoRobust face clustering viatensor decompositionrdquo IEEE Transactions on Cybernetics vol45 no 11 pp 2546ndash2557 2015

[30] I Sutskever R Salakhutdinov and J B Tenenbaum ldquoModellingrelational data using Bayesian clustered tensor factorizationrdquoin Proceedings of the 23rd Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo09) pp 1821ndash1828 BritishColumbia Canada December 2009

[31] B Ermis E Acar and A T Cemgil ldquoLink prediction in het-erogeneous data via generalized coupled tensor factorizationrdquoDataMining amp Knowledge Discovery vol 29 no 1 pp 203ndash2362015

[32] A R Benson D F Gleiche and J Leskovec ldquoTensor spectralclustering for partitioning higher-order network structuresrdquoin Proceedings of the SIAM International Conference on DataMining (SDM rsquo15) pp 118ndash126 Vancouver Canada May 2015

[33] L Xiong X Chen T-K Huang J G Schneider and JG Carbonell ldquoTemporal collaborative filtering with Bayesianprobabilistic tensor factorizationrdquo in Proceedings of the 10thSIAM International Conference on Data Mining (SDM rsquo10) pp211ndash222 Columbus Ohio USA May 2010

[34] E E Papalexakis L Akoglu and D Ience ldquoDo more views ofa graph help Community detection and clustering in multi-graphsrdquo in Proceedings of the 16th International Conferenceof Information Fusion (FUSION rsquo13) pp 899ndash905 IstanbulTurkey July 2013

[35] M Vandecappelle M Bousse F V Eeghem and L D Lath-auwer ldquoTensor decompositions for graph clusteringrdquo Inter-nal Report 16-170 ESAT-STADIUS KU Leuven LeuvenBelgium 2016 ftpftpesatkuleuvenbepubSISTAsistakul-akreports2016_Tensor_Graph_Clusteringpdf

[36] W Shao L He and P S Yu ldquoClustering on multi-sourceincomplete data via tensor modeling and factorizationrdquo inProceedings of the 19th Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD rsquo15) pp 485ndash497 Ho ChiMinh City Vietnam 2015

[37] J D Carroll and J-J Chang ldquoAnalysis of individual differencesin multidimensional scaling via an n-way generalization ofldquoEckart-Youngrdquo decompositionrdquo Psychometrika vol 35 no 3pp 283ndash319 1970

[38] L De Lathauwer B De Moor and J Vandewalle ldquoOn thebest rank-1 and rank-(R1 R2 RN) approximation ofhigher-order tensorsrdquo SIAM Journal on Matrix Analysis andApplications vol 21 no 4 pp 1324ndash1342 2000

[39] E Acar D M Dunlavy and T G Kolda ldquoA scalable opti-mization approach for fitting canonical tensor decompositionsrdquoJournal of Chemometrics vol 25 no 2 pp 67ndash86 2011

[40] S Hansen T Plantenga and T G Kolda ldquoNewton-basedoptimization for Kullback-Leibler nonnegative tensor factor-izationsrdquo Optimization Methods amp Software vol 30 no 5 pp1002ndash1029 2015

[41] NVervliet and LDe Lathauwer ldquoA randomized block samplingapproach to canonical polyadic decomposition of large-scaletensorsrdquo IEEE Journal on SelectedTopics in Signal Processing vol10 no 2 pp 284ndash295 2016

[42] R Ge F Huang C Jin and Y Yuan ldquoEscaping from saddlepointsmdashonline stochastic gradient for tensor decompositionrdquohttparxivorgabs150302101

[43] T G Kolda ldquoMultilinear operators for higher-order decompo-sitionsrdquo Tech Rep SAND2006-2081 Sandia National Labora-tories 2006 httpwwwostigovscitechbiblio923081=0pt

[44] P Paatero ldquoA weighted non-negative least squares algorithmfor three-way lsquoPARAFACrsquo factor analysisrdquo Chemometrics ampIntelligent Laboratory Systems vol 38 no 2 pp 223ndash242 1997

[45] P Paatero ldquoConstruction and analysis of degenerate PARAFACmodelsrdquo Journal of Chemometrics vol 14 no 3 pp 285ndash2992000

[46] B L Bottou and N Murata ldquoStochastic approximations andefficient learningrdquo inTheHandbook of BrainTheory and NeuralNetworks 2nd edition 2002

[47] B W Bader and T G Kolda ldquoEfficient MATLAB computationswith sparse and factored tensorsrdquo SIAM Journal on ScientificComputing vol 30 no 1 pp 205ndash231 2007

[48] A Strehl and J Ghosh ldquoCluster ensemblesmdasha knowledgereuse framework for combining multiple partitionsrdquo Journal ofMachine Learning Research vol 3 no 3 pp 583ndash617 2003

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: A Tensor CP Decomposition Method for Clustering ...downloads.hindawi.com/journals/sp/2017/2803091.pdf · spectral clustering, are designed to analyze homogeneous informationnetworks

12 Scientific Programming

1199091198941 1198942119894119873 The (1198941 1198942 119894119873)th element of an 119873th-order tensorX

X(119899) Matricization ofX along the 119899th modelowast Hadamard product (element-wise prod-uct) of two tensors (or matrices or vectors)with the same dimension⊙ Khatri-Rao product of two matrices sdot 119865 Frobenius norm of a tensor (or matrices orvectors)

a ∘ b Outer product of two vectors

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This study was supported by the National Natural ScienceFoundation of China (no 61401482 and no 61401483)

References

[1] Y Sunt J Hant P Zhao Z Yin H Cheng and T WuldquoRankClus integrating clustering with ranking for heteroge-neous information network analysisrdquo in Proceedings of the 12thInternational Conference on Extending Database TechnologyAdvances inDatabase Technology (EDBT rsquo09) pp 565ndash576 SaintPetersburg Russia March 2009

[2] Y Sun Y Yu and J Han ldquoRanking-based clustering of hetero-geneous information networks with star network schemardquo inProceedings of the 15th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD rsquo09) pp 797ndash805 Paris France July 2009

[3] Y Sun J Han X Yan P S Yu and TWu ldquoPathSim meta path-based top-K similarity search in heterogeneous informationnetworksrdquo PVLDB vol 4 no 11 pp 992ndash1003 2011

[4] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability Volume1 Statistics pp 281ndash297 University of California Press BerkeleyCalif USA 1967

[5] S Guha R Rastogi and K Shim ldquoCure an efficient clusteringalgorithm for large databasesrdquo Information Systems vol 26 no1 pp 35ndash58 2001

[6] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the 2nd International Conference onKnowledge Discovery and Data Mining (KDD-96) E SimoudisJ Han and U M Fayyad Eds pp 226ndash231 AAAI PressPortland Ore USA 1996

[7] W Wang J Yang and R R Muntz ldquoSTING a statisticalinformation grid approach to spatial data miningrdquo in Pro-ceedings of the 23rd International Conference on Very LargeData Bases (VLDB rsquo97) M Jarke M J Carey K R DittrichF H Lochovsky P Loucopoulos and M A Jeusfeld Edspp 186ndash195 Morgan Kaufmann Athens Greece August 1997httpwwwvldborgconf1997P186PDF

[8] E H Ruspini ldquoNew experimental results in fuzzy clusteringrdquoInformation Sciences vol 6 no 73 pp 273ndash284 1973

[9] U von Luxburg ldquoA tutorial on spectral clusteringrdquo Statistics andComputing vol 17 no 4 pp 395ndash416 2007

[10] C Shi Y Li J Zhang Y Sun and P S Yu ldquoA survey of hetero-geneous information network analysisrdquo IEEE Transactions onKnowledge and Data Engineering vol 29 no 1 pp 17ndash37 2017

[11] J Han Y Sun X Yan and P S Yu ldquoMining knowledge fromdata an information network analysis approachrdquo in Proceedingsof the IEEE 28th International Conference on Data Engineering(ICDE rsquo12) pp 1214ndash1217 IEEE Washington DC USA April2012

[12] J Chen W Dai Y Sun and J Dy ldquoClustering and rankingin heterogeneous information networks via gamma-poissonmodelrdquo in Proceedings of the SIAM International Conference onData Mining (SDM rsquo15) pp 424ndash432 May 2015

[13] J Yang L Chen and J Zhang ldquoFctClus a fast clustering algo-rithm for heterogeneous information networksrdquoPLoSONE vol10 no 6 Article ID e0130086 2015

[14] C Shi R Wang Y Li P S Yu and B Wu ldquoRanking-basedclustering on general heterogeneous information networks bynetwork projectionrdquo in Proceedings of the 23rd ACM Interna-tional Conference on Conference on Information and KnowledgeManagement (CIKM rsquo14) pp 699ndash708 ACM Shanghai ChinaNovember 2014

[15] Y Sun B Norick J Han X Yan P S Yu and X Yu ldquoInte-grating meta-path selection with user-guided object clusteringin heterogeneous information networksrdquo in Proceedings of the18th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo12) pp 1348ndash1356 ACMBeijing China August 2012

[16] Y Sun B Norick J Han X Yan P S Yu and X YuldquoPathSelClus integrating meta-path selection with user-guidedObject clustering in heterogeneous information networksrdquoACMTransactions onKnowledgeDiscovery fromData vol 7 no3 pp 723ndash724 2013

[17] X Yu Y Sun B Norick T Mao and J Han ldquoUser guided entitysimilarity search using meta-path selection in heterogeneousinformation networksrdquo in Proceedings of the 21st ACM Interna-tional Conference on Information and Knowledge Management(CIKM rsquo12) pp 2025ndash2029 ACM November 2012

[18] Y Sun C C Aggarwal and J Han ldquoRelation strength-awareclustering of heterogeneous information networks with incom-plete attributesrdquoProceedings of the VLDBEndowment vol 5 no5 pp 394ndash405 2012

[19] M Zhang H Hu Z He andWWang ldquoTop-k similarity searchin heterogeneous information networks with x-star networkschemardquo Expert Systems with Applications vol 42 no 2 pp699ndash712 2015

[20] Y Zhou and L Liu ldquoSocial influence based clustering ofheterogeneous information networksrdquo in Proceedings of the19th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 338ndash346 ACM Chicago IllUSA August 2013

[21] L R Tucker ldquoSome mathematical notes on three-mode factoranalysisrdquo Psychometrika vol 31 pp 279ndash311 1966

[22] R A Harshman ldquoFoundations of the parafac procedure modeland conditions for an lsquoexplanatoryrsquo multi-mode factor analysisrdquoUCLAWorking Papers in Phonetics 1969

[23] H A L Kiers ldquoTowards a standardized notation and terminol-ogy in multiway analysisrdquo Journal of Chemometrics vol 14 no3 pp 105ndash122 2000

[24] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

Scientific Programming 13

[25] A Cichocki D Mandic L De Lathauwer et al ldquoTensordecompositions for signal processing applications from two-way to multiway component analysisrdquo IEEE Signal ProcessingMagazine vol 32 no 2 pp 145ndash163 2015

[26] W Peng and T Li ldquoTensor clustering via adaptive subspaceiterationrdquo Intelligent Data Analysis vol 15 no 5 pp 695ndash7132011

[27] J Hastad ldquoTensor rank is NP-completerdquo Journal of AlgorithmsCognition Informatics and Logic vol 11 no 4 pp 644ndash6541990

[28] S Metzler and P Miettinen ldquoClustering Boolean tensorsrdquo DataMining amp Knowledge Discovery vol 29 no 5 pp 1343ndash13732015

[29] X Cao X Wei Y Han and D Lin ldquoRobust face clustering viatensor decompositionrdquo IEEE Transactions on Cybernetics vol45 no 11 pp 2546ndash2557 2015

[30] I Sutskever R Salakhutdinov and J B Tenenbaum ldquoModellingrelational data using Bayesian clustered tensor factorizationrdquoin Proceedings of the 23rd Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo09) pp 1821ndash1828 BritishColumbia Canada December 2009

[31] B Ermis E Acar and A T Cemgil ldquoLink prediction in het-erogeneous data via generalized coupled tensor factorizationrdquoDataMining amp Knowledge Discovery vol 29 no 1 pp 203ndash2362015

[32] A R Benson D F Gleiche and J Leskovec ldquoTensor spectralclustering for partitioning higher-order network structuresrdquoin Proceedings of the SIAM International Conference on DataMining (SDM rsquo15) pp 118ndash126 Vancouver Canada May 2015

[33] L Xiong X Chen T-K Huang J G Schneider and JG Carbonell ldquoTemporal collaborative filtering with Bayesianprobabilistic tensor factorizationrdquo in Proceedings of the 10thSIAM International Conference on Data Mining (SDM rsquo10) pp211ndash222 Columbus Ohio USA May 2010

[34] E E Papalexakis L Akoglu and D Ience ldquoDo more views ofa graph help Community detection and clustering in multi-graphsrdquo in Proceedings of the 16th International Conferenceof Information Fusion (FUSION rsquo13) pp 899ndash905 IstanbulTurkey July 2013

[35] M Vandecappelle M Bousse F V Eeghem and L D Lath-auwer ldquoTensor decompositions for graph clusteringrdquo Inter-nal Report 16-170 ESAT-STADIUS KU Leuven LeuvenBelgium 2016 ftpftpesatkuleuvenbepubSISTAsistakul-akreports2016_Tensor_Graph_Clusteringpdf

[36] W Shao L He and P S Yu ldquoClustering on multi-sourceincomplete data via tensor modeling and factorizationrdquo inProceedings of the 19th Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD rsquo15) pp 485ndash497 Ho ChiMinh City Vietnam 2015

[37] J D Carroll and J-J Chang ldquoAnalysis of individual differencesin multidimensional scaling via an n-way generalization ofldquoEckart-Youngrdquo decompositionrdquo Psychometrika vol 35 no 3pp 283ndash319 1970

[38] L De Lathauwer B De Moor and J Vandewalle ldquoOn thebest rank-1 and rank-(R1 R2 RN) approximation ofhigher-order tensorsrdquo SIAM Journal on Matrix Analysis andApplications vol 21 no 4 pp 1324ndash1342 2000

[39] E Acar D M Dunlavy and T G Kolda ldquoA scalable opti-mization approach for fitting canonical tensor decompositionsrdquoJournal of Chemometrics vol 25 no 2 pp 67ndash86 2011

[40] S Hansen T Plantenga and T G Kolda ldquoNewton-basedoptimization for Kullback-Leibler nonnegative tensor factor-izationsrdquo Optimization Methods amp Software vol 30 no 5 pp1002ndash1029 2015

[41] NVervliet and LDe Lathauwer ldquoA randomized block samplingapproach to canonical polyadic decomposition of large-scaletensorsrdquo IEEE Journal on SelectedTopics in Signal Processing vol10 no 2 pp 284ndash295 2016

[42] R Ge F Huang C Jin and Y Yuan ldquoEscaping from saddlepointsmdashonline stochastic gradient for tensor decompositionrdquohttparxivorgabs150302101

[43] T G Kolda ldquoMultilinear operators for higher-order decompo-sitionsrdquo Tech Rep SAND2006-2081 Sandia National Labora-tories 2006 httpwwwostigovscitechbiblio923081=0pt

[44] P Paatero ldquoA weighted non-negative least squares algorithmfor three-way lsquoPARAFACrsquo factor analysisrdquo Chemometrics ampIntelligent Laboratory Systems vol 38 no 2 pp 223ndash242 1997

[45] P Paatero ldquoConstruction and analysis of degenerate PARAFACmodelsrdquo Journal of Chemometrics vol 14 no 3 pp 285ndash2992000

[46] B L Bottou and N Murata ldquoStochastic approximations andefficient learningrdquo inTheHandbook of BrainTheory and NeuralNetworks 2nd edition 2002

[47] B W Bader and T G Kolda ldquoEfficient MATLAB computationswith sparse and factored tensorsrdquo SIAM Journal on ScientificComputing vol 30 no 1 pp 205ndash231 2007

[48] A Strehl and J Ghosh ldquoCluster ensemblesmdasha knowledgereuse framework for combining multiple partitionsrdquo Journal ofMachine Learning Research vol 3 no 3 pp 583ndash617 2003

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 13: A Tensor CP Decomposition Method for Clustering ...downloads.hindawi.com/journals/sp/2017/2803091.pdf · spectral clustering, are designed to analyze homogeneous informationnetworks

Scientific Programming 13

[25] A Cichocki D Mandic L De Lathauwer et al ldquoTensordecompositions for signal processing applications from two-way to multiway component analysisrdquo IEEE Signal ProcessingMagazine vol 32 no 2 pp 145ndash163 2015

[26] W Peng and T Li ldquoTensor clustering via adaptive subspaceiterationrdquo Intelligent Data Analysis vol 15 no 5 pp 695ndash7132011

[27] J Hastad ldquoTensor rank is NP-completerdquo Journal of AlgorithmsCognition Informatics and Logic vol 11 no 4 pp 644ndash6541990

[28] S Metzler and P Miettinen ldquoClustering Boolean tensorsrdquo DataMining amp Knowledge Discovery vol 29 no 5 pp 1343ndash13732015

[29] X Cao X Wei Y Han and D Lin ldquoRobust face clustering viatensor decompositionrdquo IEEE Transactions on Cybernetics vol45 no 11 pp 2546ndash2557 2015

[30] I Sutskever R Salakhutdinov and J B Tenenbaum ldquoModellingrelational data using Bayesian clustered tensor factorizationrdquoin Proceedings of the 23rd Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo09) pp 1821ndash1828 BritishColumbia Canada December 2009

[31] B Ermis E Acar and A T Cemgil ldquoLink prediction in het-erogeneous data via generalized coupled tensor factorizationrdquoDataMining amp Knowledge Discovery vol 29 no 1 pp 203ndash2362015

[32] A R Benson D F Gleiche and J Leskovec ldquoTensor spectralclustering for partitioning higher-order network structuresrdquoin Proceedings of the SIAM International Conference on DataMining (SDM rsquo15) pp 118ndash126 Vancouver Canada May 2015

[33] L Xiong X Chen T-K Huang J G Schneider and JG Carbonell ldquoTemporal collaborative filtering with Bayesianprobabilistic tensor factorizationrdquo in Proceedings of the 10thSIAM International Conference on Data Mining (SDM rsquo10) pp211ndash222 Columbus Ohio USA May 2010

[34] E E Papalexakis L Akoglu and D Ience ldquoDo more views ofa graph help Community detection and clustering in multi-graphsrdquo in Proceedings of the 16th International Conferenceof Information Fusion (FUSION rsquo13) pp 899ndash905 IstanbulTurkey July 2013

[35] M Vandecappelle M Bousse F V Eeghem and L D Lath-auwer ldquoTensor decompositions for graph clusteringrdquo Inter-nal Report 16-170 ESAT-STADIUS KU Leuven LeuvenBelgium 2016 ftpftpesatkuleuvenbepubSISTAsistakul-akreports2016_Tensor_Graph_Clusteringpdf

[36] W Shao L He and P S Yu ldquoClustering on multi-sourceincomplete data via tensor modeling and factorizationrdquo inProceedings of the 19th Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD rsquo15) pp 485ndash497 Ho ChiMinh City Vietnam 2015

[37] J D Carroll and J-J Chang ldquoAnalysis of individual differencesin multidimensional scaling via an n-way generalization ofldquoEckart-Youngrdquo decompositionrdquo Psychometrika vol 35 no 3pp 283ndash319 1970

[38] L De Lathauwer B De Moor and J Vandewalle ldquoOn thebest rank-1 and rank-(R1 R2 RN) approximation ofhigher-order tensorsrdquo SIAM Journal on Matrix Analysis andApplications vol 21 no 4 pp 1324ndash1342 2000

[39] E Acar D M Dunlavy and T G Kolda ldquoA scalable opti-mization approach for fitting canonical tensor decompositionsrdquoJournal of Chemometrics vol 25 no 2 pp 67ndash86 2011

[40] S Hansen T Plantenga and T G Kolda ldquoNewton-basedoptimization for Kullback-Leibler nonnegative tensor factor-izationsrdquo Optimization Methods amp Software vol 30 no 5 pp1002ndash1029 2015

[41] NVervliet and LDe Lathauwer ldquoA randomized block samplingapproach to canonical polyadic decomposition of large-scaletensorsrdquo IEEE Journal on SelectedTopics in Signal Processing vol10 no 2 pp 284ndash295 2016

[42] R Ge F Huang C Jin and Y Yuan ldquoEscaping from saddlepointsmdashonline stochastic gradient for tensor decompositionrdquohttparxivorgabs150302101

[43] T G Kolda ldquoMultilinear operators for higher-order decompo-sitionsrdquo Tech Rep SAND2006-2081 Sandia National Labora-tories 2006 httpwwwostigovscitechbiblio923081=0pt

[44] P Paatero ldquoA weighted non-negative least squares algorithmfor three-way lsquoPARAFACrsquo factor analysisrdquo Chemometrics ampIntelligent Laboratory Systems vol 38 no 2 pp 223ndash242 1997

[45] P Paatero ldquoConstruction and analysis of degenerate PARAFACmodelsrdquo Journal of Chemometrics vol 14 no 3 pp 285ndash2992000

[46] B L Bottou and N Murata ldquoStochastic approximations andefficient learningrdquo inTheHandbook of BrainTheory and NeuralNetworks 2nd edition 2002

[47] B W Bader and T G Kolda ldquoEfficient MATLAB computationswith sparse and factored tensorsrdquo SIAM Journal on ScientificComputing vol 30 no 1 pp 205ndash231 2007

[48] A Strehl and J Ghosh ldquoCluster ensemblesmdasha knowledgereuse framework for combining multiple partitionsrdquo Journal ofMachine Learning Research vol 3 no 3 pp 583ndash617 2003

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 14: A Tensor CP Decomposition Method for Clustering ...downloads.hindawi.com/journals/sp/2017/2803091.pdf · spectral clustering, are designed to analyze homogeneous informationnetworks

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014