10
0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2734888, IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016 1 Kronecker-Basis-Representation Based Tensor Sparsity and Its Applications to Tensor Recovery Qi Xie, Qian Zhao, Deyu Meng and Zongben Xu Abstract—As a promising way for analyzing data, sparse modeling has achieved great success throughout science and engineering. It is well known that the sparsity/low-rank of a vector/matrix can be rationally measured by nonzero-entries-number (l 0 norm)/nonzero- singular-values-number (rank), respectively. However, data from real applications are often generated by the interaction of multiple factors, which obviously cannot be sufficiently represented by a vector/matrix, while a high order tensor is expected to provide more faithful representation to deliver the intrinsic structure underlying such data ensembles. Unlike the vector/matrix case, constructing a rational high order sparsity measure for tensor is a relatively harder task. To this aim, in this paper we propose a measure for tensor sparsity, called Kronecker-basis-representation based tensor sparsity measure (KBR briefly), which encodes both sparsity insights delivered by Tucker and CANDECOMP/PARAFAC (CP) low-rank decompositions for a general tensor. Then we study the KBR regularization minimization (KBRM) problem, and design an effective ADMM algorithm for solving it, where each involved parameter can be updated with closed-form equations. Such an efficient solver makes it possible to extend KBR to various tasks like tensor completion and tensor robust principal component analysis. A series of experiments, including multispectral image (MSI) denoising, MSI completion and background subtraction, substantiate the superiority of the proposed methods beyond state-of-the-arts. Index Terms—Tensor sparsity, Tucker decomposition, CANDECOMP/PARAFAC decomposition, tensor completion, multi-spectral image restoration. 1 I NTRODUCTION S PARSITY is a common information representation prop- erty which means that an observation can be represented by a few atoms of an appropriately chosen dictionary. So far sparsity-based methods have achieved great success throughout science and engineering. Typical examples in- clude face modeling [47], [60], gene categorization [54], image/video compressive sensing [3], [10], user interest prediction [20], signal restoration [47], [68], [75], etc. It is well known that the sparsity/low-rankness of a vector/matrix can be rationally measured by the number of nonzero entries/nonzero singular values (l 0 norm/rank). Such sparsity measures, as well as their relaxation forms (e.g., l 1 norm and nuclear norm), have been shown to be helpful to finely encode the data sparsity in applications, and has inspired various sparse/low-rank models and algo- rithms against different practical problem [6], [9], [14], [26], [47], [50], [68], [75]. However, data from many real applications are often generated by the interaction of multiple factors. The tra- ditional vector or matrix, which can only well address single/binary-factor variability of data, obviously is not the best way to keep the multi-factor structure of this kind of data. For example, a multispectral image (MSI) consists of a collection of images scattered over various discrete bands and thus includes three intrinsic constituent factors, i.e., spectrum and spatial width and height. Expressing MSI as a This work was supported by the National Natural Science Foundation of China under Grant No 61661166011, 61373114, 11690011, 61603292, the 973 Program of China under Grant No. 2013CB329404 and Macau Science and Technology Development Funds under Grant No. 003/2016/AFJ. Qi Xie, Qian Zhao, Deyu Meng (corresponding author) and Zongben Xu are with School of Mathematics and Statistics and Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi’an Jiaotong Univer- sity, Shaanxi, P.R.China. Email: [email protected], {timmy.zhaoqian, dymeng, zbxu}@mail.xjtu.edu.cn 1 st Mode Unfloding 2 nd Mode Unfloding 3 th Mode Unfloding (a) An MSI NoiseModeling (b) Singular Values of Unfolding Matrices Index Singular Value 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 mode 1 mode 3 mode 2 Fig. 1. Illustration of correlation priors on each mode of an MSI. (a) A real MSI of size 80 × 80 × 30. (b) Singular value curves of the matrices unfolded along three tensor modes. The low-rank property of the subspace along each mode can be easily observed from the dramatic decreasing effect of these curves. matrix will inevitably damage its three-factor structure with only two factors considered. Instead of vector or matrix, a higher-order tensor, represented as a multidimensional array, provides a more faithful representation to deliver the intrinsic structure underlying such data ensembles. The techniques on tensors have thus been attracting much atten- tion recently and helped enhance performance of various practical tasks, such as MSI denoising [69], 3D image recon- struction [59], and higher-order web link analysis [32]. A tensor collected from real scenarios are always with an evident correlation along each of its modes. By taking the MSI shown in Fig. 1 as an example, the correlation along each of its spectral and spatial modes can be evidently observed both quantitatively and visually. This reflects the fact that the tensor along each mode resides on a low-rank subspace and the entire tensor corresponds to the affiliation of the subspaces along all tensor modes. Thus, in order to faithfully deliver such sparsity knowledge underlying tensor and enhance the performance of sparsity-based ten-

Kronecker-Basis-Representation Based Tensor Sparsity and ...mathbigdata.xjtu.edu.cn/TPAMI2018-2.pdf · Transactions on Pattern Analysis and Machine Intelligence IEEE TRANSACTIONS

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Kronecker-Basis-Representation Based Tensor Sparsity and ...mathbigdata.xjtu.edu.cn/TPAMI2018-2.pdf · Transactions on Pattern Analysis and Machine Intelligence IEEE TRANSACTIONS

0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2734888, IEEETransactions on Pattern Analysis and Machine Intelligence

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016 1

Kronecker-Basis-Representation Based TensorSparsity and Its Applications to Tensor Recovery

Qi Xie, Qian Zhao, Deyu Meng and Zongben Xu

Abstract—As a promising way for analyzing data, sparse modeling has achieved great success throughout science and engineering. Itis well known that the sparsity/low-rank of a vector/matrix can be rationally measured by nonzero-entries-number (l0 norm)/nonzero-singular-values-number (rank), respectively. However, data from real applications are often generated by the interaction of multiplefactors, which obviously cannot be sufficiently represented by a vector/matrix, while a high order tensor is expected to provide morefaithful representation to deliver the intrinsic structure underlying such data ensembles. Unlike the vector/matrix case, constructing arational high order sparsity measure for tensor is a relatively harder task. To this aim, in this paper we propose a measure for tensorsparsity, called Kronecker-basis-representation based tensor sparsity measure (KBR briefly), which encodes both sparsity insightsdelivered by Tucker and CANDECOMP/PARAFAC (CP) low-rank decompositions for a general tensor. Then we study the KBRregularization minimization (KBRM) problem, and design an effective ADMM algorithm for solving it, where each involved parametercan be updated with closed-form equations. Such an efficient solver makes it possible to extend KBR to various tasks like tensorcompletion and tensor robust principal component analysis. A series of experiments, including multispectral image (MSI) denoising,MSI completion and background subtraction, substantiate the superiority of the proposed methods beyond state-of-the-arts.

Index Terms—Tensor sparsity, Tucker decomposition, CANDECOMP/PARAFAC decomposition, tensor completion, multi-spectralimage restoration.

F

1 INTRODUCTION

S PARSITY is a common information representation prop-erty which means that an observation can be represented

by a few atoms of an appropriately chosen dictionary. Sofar sparsity-based methods have achieved great successthroughout science and engineering. Typical examples in-clude face modeling [47], [60], gene categorization [54],image/video compressive sensing [3], [10], user interestprediction [20], signal restoration [47], [68], [75], etc.

It is well known that the sparsity/low-rankness of avector/matrix can be rationally measured by the numberof nonzero entries/nonzero singular values (l0 norm/rank).Such sparsity measures, as well as their relaxation forms(e.g., l1 norm and nuclear norm), have been shown to behelpful to finely encode the data sparsity in applications,and has inspired various sparse/low-rank models and algo-rithms against different practical problem [6], [9], [14], [26],[47], [50], [68], [75].

However, data from many real applications are oftengenerated by the interaction of multiple factors. The tra-ditional vector or matrix, which can only well addresssingle/binary-factor variability of data, obviously is not thebest way to keep the multi-factor structure of this kind ofdata. For example, a multispectral image (MSI) consists of acollection of images scattered over various discrete bandsand thus includes three intrinsic constituent factors, i.e.,spectrum and spatial width and height. Expressing MSI as a

This work was supported by the National Natural Science Foundation ofChina under Grant No 61661166011, 61373114, 11690011, 61603292, the973 Program of China under Grant No. 2013CB329404 and Macau Scienceand Technology Development Funds under Grant No. 003/2016/AFJ.Qi Xie, Qian Zhao, Deyu Meng (corresponding author) and Zongben Xu arewith School of Mathematics and Statistics and Ministry of Education KeyLab of Intelligent Networks and Network Security, Xi’an Jiaotong Univer-sity, Shaanxi, P.R.China. Email: [email protected], timmy.zhaoqian,dymeng, [email protected]

1st Mode Unfloding2nd Mode Unfloding3th Mode Unfloding

(a) An MSI NoiseModeling (b) Singular Values of Unfolding Matrices

Index

Sing

ular

Val

ue

0 10 20 30 40 50 60 70 80

0

10

20

30

40

50

0 10 20 30 40 50 60 70 80

0

10

20

30

40

50

0 10 20 30 40 50 60 70 80

0

10

20

30

40

50

mod

e 1

mod

e 3

mode 2

Fig. 1. Illustration of correlation priors on each mode of an MSI. (a)A real MSI of size 80 × 80 × 30. (b) Singular value curves of thematrices unfolded along three tensor modes. The low-rank propertyof the subspace along each mode can be easily observed from thedramatic decreasing effect of these curves.

matrix will inevitably damage its three-factor structure withonly two factors considered. Instead of vector or matrix,a higher-order tensor, represented as a multidimensionalarray, provides a more faithful representation to deliverthe intrinsic structure underlying such data ensembles. Thetechniques on tensors have thus been attracting much atten-tion recently and helped enhance performance of variouspractical tasks, such as MSI denoising [69], 3D image recon-struction [59], and higher-order web link analysis [32].

A tensor collected from real scenarios are always withan evident correlation along each of its modes. By takingthe MSI shown in Fig. 1 as an example, the correlationalong each of its spectral and spatial modes can be evidentlyobserved both quantitatively and visually. This reflects thefact that the tensor along each mode resides on a low-ranksubspace and the entire tensor corresponds to the affiliationof the subspaces along all tensor modes. Thus, in orderto faithfully deliver such sparsity knowledge underlyingtensor and enhance the performance of sparsity-based ten-

Page 2: Kronecker-Basis-Representation Based Tensor Sparsity and ...mathbigdata.xjtu.edu.cn/TPAMI2018-2.pdf · Transactions on Pattern Analysis and Machine Intelligence IEEE TRANSACTIONS

0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2734888, IEEETransactions on Pattern Analysis and Machine Intelligence

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016 2

sor recovery techniques, it is always crucial to consider aquantitative measure for assessing tensor sparsity.

Mathematically, a sparsity-based tensor recovery modelcan generally be expressed as follows:

minX

S(X ) + γL(X ,Y), (1)

where Y ∈ RI1×I2×···×IN is the observation, L(X ,Y) is theloss function between X and Y , S(X ) defines the tensorsparsity measure of X and γ is the compromise parameter.It is easy to see that the key problem in constructing (1) isto design an appropriate tensor sparsity measure on data.While not like vector/matrix cases, extracting a rationalhigh-order sparsity measure for tensor is always a muchharder task.

Most current works directly extended the rank of ma-trix to higher-order by simply summing up ranks (or itsrelaxations) along all tensor modes [11], [24], [40], [57].However, different from matrix scenarios, this simple rank-summation term is generally short of a clear physical mean-ing for tensors. Specifically, tensor sparsity insight should beinterpreted beyond the low-rank property of all its unfold-ed subspaces, and should more importantly consider howsuch subspace sparsities are affiliated over the entire tensorstructure. Besides, such a simple tensor sparsity measuremore or less lacks a consistent relationship with previoussparsity definitions for vectors and matrices.

To handle the aforementioned issues, this paper mainlymakes three-fold contributions. Firstly, a tensor sparsitymeasure is proposed, and its insight can be easily interpret-ed as a regularization for the number of rank-1 Kroneckerbases for representing the tensor. We thus call it Kronecker-basis-representation based tensor sparsity measure (KBR)for convenience. Such measure not only unifies the tradi-tional understanding of sparsity from vector (1-order tensor)to matrix (2-order tensor), but also encodes both sparsityinsights delivered by the most typical Tucker and CP low-rank decompositions for a general tensor.

Secondly, we design an effective alternating directionmethod of multipliers (ADMM) algorithm [4], [38] for solv-ing the KBR-based minimization (KBRM) problem (1). Wefurther deduce the closed-form equations for updating eachinvolved parameter, which makes the algorithm be able tobe efficiently implemented. This solver facilitates a generalutilization of KBR regularization to more tensor analysistasks. Specifically, we extend KBRM to two of the mosttypical tensor recovery problems and propose the KBR-based tensor completion (KBR-TC) model and KBR-basedrobust principal component analysis (KBR-RPCA) model.Similar ADMM algorithms have been designed to solve themodels by using the proposed KBRM solver.

Thirdly, we adapt the proposed KBR-based models tomultiple MSI/video tasks. In MSI denoising, we propose anew tensor-based denoising model by exploiting the pro-posed KBR measure to encode the inherent spatial nonlocalsimilarity and spectral correlation in MSI, and performKBRM to recover the clean MSI. The model is with a conciseformulation and can be easily extended to solving otherMSI recovery problems. Experiments on benchmark andreal MSI data show that the proposed method achievesthe state-of-the-art performance on MSI denoising amongcomprehensive quality assessments. In MSI completion and

background subtraction experiments, the proposed KBR-TC and KBR-RPCA models also achieve better performancethan traditional algorithms.

Throughout the paper, we denote scalar, vector, matrixand tensor as non-bold lower case, bold lower case, uppercase and calligraphic upper case letters, respectively.

2 NOTIONS AND PRELIMINARIES

We first introduce some necessary notions and preliminariesas follows.

A tensor of order N is denoted as A ∈ RI1×I2×···IN .Elements ofA are denoted as ai1···in···iN where 1 ≤ in ≤ In.The mode-n vectors of an N -order tensor A are the Indimensional vectors obtained from A by varying index inwhile keeping the others fixed. The unfolding matrixA(n) =unfoldn(A) ∈ RIn×(I1···In−1In+1···IN ) is composed by takingthe mode-n vectors ofA as its columns. This matrix can alsobe naturally seen as the mode-n flattening of A. Conversely,the unfolding matrices along the nth mode can be trans-formed back to the tensor by A = foldn

(A(n)

), 1 ≤ n ≤ N .

The n-rank A, denoted as rn, is the dimension of the vectorspace spanned by the mode-n vectors of A.

The mode-n product of a tensor A ∈ RI1×I2×···IN bya matrix B ∈ RJn×In , denoted by A ×n B, is an N -ordertensor C ∈ RI1×···×Jn×···IN , with entries:

ci1×···in−1×jn×in+1...iN =∑

inai1···in···iN bjnin .

The mode-n product C = A ×n B can also be calculatedby the matrix multiplication C(n) = BA(n), followed bythe re-tensorization of undoing the mode-n flattening. Forconvenience, we define A×−nBjNj=1 as

A×1 B1 ×2 · · · ×n−1 Bn−1 ×n+1 Bn+1 · · · ×N BN .

The Frobenius norm of an tensor A is ‖A‖F =(∑i1,···iN

|ai1,···iN |2)1/2

.

We call a tensor A ∈ RI1×I2×...IN is rank-1 if it can bewritten as the outer product of N vectors, i.e.,

A = a(1) a(2) · · · a(N),

where represents the vector outer product. This means thateach element of the tensor is the product of the correspond-ing vector elements:

ai1,i2,··· ,iN = a(1)i1a

(2)i2...a

(N)iN

∀ 1 ≤ in ≤ In. (2)

such a simple rank-1 tensor is also called a Kronecker basisin the tensor space. E.g., in 2D case, a Kronecker basis isexpressed as the outer product uvT of two vectors u and v.

3 RELATED WORKS ON TENSOR SPARSITY

In this section, we first review some sparsity-based tensorrecovery methods proposed in previous literatures, and thenbriefly review two particular tensor decompositions, bothcontaining insightful understanding of tensor sparsity.

3.1 Previous work on tensor recoveryThe simplest way to deal with tensor recovery problemis unfolding the objective tensor in a reasonable mode totransform it to a 2-order matrix and then perform ma-trix recovery on it. Most of the matrix recovery methods

Page 3: Kronecker-Basis-Representation Based Tensor Sparsity and ...mathbigdata.xjtu.edu.cn/TPAMI2018-2.pdf · Transactions on Pattern Analysis and Machine Intelligence IEEE TRANSACTIONS

0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2734888, IEEETransactions on Pattern Analysis and Machine Intelligence

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016 3

are constructed based on low rank matrix approximation(LRMA), which aims to recover the underlying low rankmatrix from its degraded observation. The LRMA problemcan be categorized as: the low rank matrix factorization(LRMF), aiming to factorize the objective matrix into two flatmatrices, and rank minimization, aiming to reconstruct thedata matrix through imposing an additional rank constraintupon the estimated matrix. Both of these two categorieshave a variety of extensions as proposed in [5], [6], [19], [30],[61]. Furthermore, there are many matrix recovery methodsconstructed by adding more prior knowledge related to datacases together with matrix low-rankness [16], [22], such asjoint trace/TV based MSI recovery [23], [27] and joint low-rank/sparsity based matrix recovery [55], [56], etc.

Matrix completion (MC) and matrix-based robust princi-pal component analysis (RPCA) are two of the most typicalmatrix recovery problems. The MC problem has been recent-ly arising in multiple applications including collaborativefiltering and latent semantic analysis [7], [19], [36]. Candesand Recht [8] prompted a new surge for this problem byshowing that the matrix can be exactly recovered from anincomplete set of entries through solving a convex semidefi-nite programming. Various ameliorations have been furtherproposed for this task [28], [38]. The RPCA problem wasinitially formulated by Wright et al. [68] with the theoret-ical guarantee to exactly recover the ground truth tensorfrom grossly corrupted one under certain assumptions [6].To speed up computation, Lin et al. proposed the accel-erated proximal gradient (APG) [37] and the augmentedLagrangian multiplier (ALM) [36] methods. Bayesian RPCAapproaches have also been investigated in [2], [15], [75].

Albeit easy for computation, unfolding a tensor to amatrix will inevitably destroy the intrinsic structure un-derlying a tensor. For example, unfolding an MSI along itsspectral dimensionality will damage the spatial informationof every bands in the MSI. Thus, recovering a tensor throughimposing sparsity directly on tensors has being attractingincreasing research interest in recent years. Different frommatrix cases with natural sparsity measure (rank), the tensorcase is more complicated and it is crucial to construct a ra-tional sparsity measure to describe the intrinsic correlationsacross various tensor modes.

Corresponding to the MC and RPCA issues for matrix,investigations on both tensor completion (TC) and tensor-based RPCA (TRPCA) have been made in the recent decade.E.g., Liu et al. [39], [40] proposed a high-order tensor spar-sity measure with the weighted sum of all unfolding matrixranks (or relaxations) along all tensor modes, and appliedit to implementing TC tasks. Goldfarb and Qin [24] furtherapplied the similar sparsity measure to TRPCA problem.Romera-Paredes and Pontil [57] promoted this “sum ofranks” measure by relaxing it to a more tight convex form,and Cao et al. [11] further ameliorated it by using non-convex relaxation. Recently, Wang et al. [66] proposed aworst case measure, i.e., measuring the sparsity of a tensorby its largest rank of all unfolding matrices, and relaxedit with a sum of exponential forms. Designed mainly forvideos, Zhang et al. [74] developed a tensor-SVD (t-SVD)based high-order sparsity measure for both TC and TRPCAproblems. Very recently, Lu et al. [42] further proved theexactly-recover-property for t-SVD-based TRPCA method.

(b) CP decomposition Self-paced Learning

(a) Tucker decomposition

(c) Vector representation

sparsity measure

sparsity measure

(d) Matrix decomposition

Fig. 2. Illustration of (a) Tucker decomposition and (b) CP decompositionof X ∈ RI1×I2×I3 . (c) Vector case representation. (d) Matrix casedecomposition.

It is easy to see that most current high-order sparsitymeasures are constructed based on the weighted sum ofranks along all tensor modes. However, as we mentionedbefore, constructing tensor sparsity measure by this simplerank-summation term is short of a clear physical meaningfor general tensors and lack of a consistent relationshipwith previous defined sparsity measures for vector/matrix.Besides, it is hard to decide the weight for penalizing eachdimensionality ranks. A common way is using the sameweights on all modes, but this is not always rational. Stilltaking the MSI data as an example, the rank of an MSI alongits spectral dimensionality should be much lower than thosealong its spatial dimensionalities. We thus should morelargely penalize the spectral rank than spatial ones, i.e., weshould set a larger weight on the former while smaller onlatter. Thus imposing the similar weights on ranks along allmodes is more or less irrational in many real scenarios.

The KBR measure has been formally proposed in ourprevious work [69]. In this paper, we further extend themeasure to TC and TRPCA problems. More comprehensiveexperiments on MSI completion and background subtrac-tion have been presented to substantiate the superiority ofthe proposed method in wide range of applications.

3.2 Tucker decomposition & CP decompositionTucker and CP decompositions are the two most typicallyadopted tensor decomposition forms. Tucker decompositiondecomposes a tensor as an affiliation of the orthogonal basesalong all its modes integrated by a core coefficient tensor,and CP decomposes a tensor as a sum of rank-1 Kroneckerbases, as visualized in Fig. 2.

Specifically, in Tucker decomposition [64], an N -ordertensorX ∈ RI1×···×IN can be written as the following form:

X = S ×1 U1 ×2 U2 ×3 · · · ×N UN , (3)

where S∈RR1×···×RN (ri≤Ri≤Ii) is called the core tensor,and Ui ∈RIi×Ri(1≤ i≤N) is composed by the Ri orthog-onal bases along the ith mode of X . When the subspacebases along each mode is sorted based on their importancefor tensor representation, the values of elements of the coretensor will show an approximately descending order alongeach of tensor modes. In this case, there will be a nonzero-block of size r1× r2×· · ·× rN , in the top left of the coretensor, as showed in Fig. 3, and the elements outside of thisnonzero-block of core tensor will be all zeros. Under such a

Page 4: Kronecker-Basis-Representation Based Tensor Sparsity and ...mathbigdata.xjtu.edu.cn/TPAMI2018-2.pdf · Transactions on Pattern Analysis and Machine Intelligence IEEE TRANSACTIONS

0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2734888, IEEETransactions on Pattern Analysis and Machine Intelligence

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016 5

Besides, the proposed KBR unifies the traditional sparsi-ty measures throughout 1-order to 2-order. The traditional1-order sparsity of a vector is measured by the number ofthe bases (from a predefined dictionary) that can representthe vector as the linear combination of them. Since in vectorcase, a Kronecker basis is just a common vector, this mea-surement is the Kronecker basis number required to rep-resent the vector, which fully complies with our proposedsparsity measure and its physical meaning. The traditional2-order sparsity of a matrix is assessed by its rank. If therank of a matrix is r, then it implies that (i) the matrix canbe represented as r Kronecker bases (each with form uvT ),and (ii) the number of dimensions of the subspace spannedupon its two modes are both r, implying the size of thenonzero-block in the core tensor of Tucker decomposition isr×r. It’s easy to see that (i) and (ii) is respectively consistentto the first and second term of KBR. The above analysis canbe easily understood by observing Fig. 2.

4.1 RelaxationNote that the l0 and rank terms in (5) can only take discretevalues, and lead to combinatorial optimization problem inapplications which is hard to solve. We thus relax the KBRas a log-sum form to simplify computation. The effective-ness of such relaxation has been substantiated in previousresearch [9], [26], [43], [62].

We can then obtain the following relaxed form of KBR:

S?(X ) = tPls (S) + (1− t)∏N

j=1P ∗ls(X(j)

), (6)

where

Pls (A) =∑

i1,...,iN

(log(|ai1,...,iN |+ ε)− log (ε)) /(− log (ε)),

P ∗ls (A) =∑m

(log (σm(A) + ε)− log (ε)) /(− log (ε)),

are both with log-sum forms (shifted to [0,+∞], and scaledby − log (ε) to better approach l0 norm and rank, respec-tively), ε is a small positive number, and σm(A) defines themth singular value of A. In the later section, we will use thisrelaxation form of KBR to build KBR-based models.

5 KBR-BASED MODEL & ITS SOLVING SCHEME

In this section, we first introduce the general solving schemefor the KBR-based minimization problem, and then intro-duce KBR-based tensor completion and KBR-based robustprincipal component analysis extensions.

5.1 The KBR minimization modelKBRM model corresponds to the fundamental KBR-basedtensor recovery model aiming at restoring a tensor from itsobservation under KBR regularization on tensor sparsity. Byusing the relaxation form S?(·) as defined in (6), we have thefollowing KBRM model:

minX

Pls (S) + λ∏N

j=1P ∗ls(X(j)

)+β

2‖Y − X‖2F , (7)

where λ = 1−tt and β is the compromise parameter.

The alternating direction method of multipliers (ADMM)[4], [38], an effective strategy for solving large scale op-timization problems, is readily employed for solving (7).

Firstly, we need to introduce N auxiliary tensors Mj

(j = 1, 2, · · · , N) and equivalently reformulate (7) as:

minS,Mj,Uj

Pls(S)+λ∏N

j=1P ∗ls(Mj(j)

)+β

2‖Y−S×1U1· · ·×NUN‖2F

s.t. S×1U1· · ·×N UN −Mj = 0,

UTj Uj=I, j = 1, 2, · · · , N,

(8)

where Mj(j) = unfoldj(Mj). Then its augmented La-grangian function is with the form:

Lµ(S,M1,· · ·,MN ,U1,· · ·,UN ,P1,· · ·,PN)=Pls(S)+λ∏N

j=1P ∗ls(Mj(j)

)+β

2‖Y−S×1U1· · ·×NUN‖2F +

∑N

j=1〈S×1U1· · ·×NUN−Mj ,Pj〉

+∑N

j=1

µ

2‖S×1U1· · ·×NUN −Mj‖2F ,

where Pjs are the Lagrange multipliers, µ is a positive scalarand Uj satisfies UTj Uj = I , ∀j = 1, 2, · · · , N . Now we cansolve the problem under the ADMM framework.

With the other parameters fixed, S can be updated bysolving minS Lµ(S,M1,· · ·,MN ,U1,· · ·,UN ,P1,· · ·,PN), i.e.:

minSbPls (S) +

1

2‖S×1U1· · ·×NUN −O‖2F , (9)

where b = 1β+Nµ and O =

βY+∑j (µMj−Pj)β+Nµ . Since for any

tensor D we have

‖D × V ‖2F = ‖D‖2F , ∀ VTV = I, (10)

by mode-j producting UTj on each mode, problem (9) turnsto the following problem:

minSbPls (S) +

1

2‖S −Q‖2F , (11)

where Q = O ×1 UT1 · · · ×N UTN , which has been proved to

have closed-form solution [25]:

S+ = Db,ε(Q). (12)

Here, Db,ε(·) is the thresholding operator defined as:

Db,ε(x) =

0 if |x| ≤ 2

√c0b− ε

sign(x)(c1(x)+c2(x)

2

)if |x| > 2

√c0b− ε

,

(13)where c0 = −1

log(ε) , c1(x)= |x|−ε, c2(x)=√

(|x|+ε)2−4c0b.

With Uj(j 6= k) and other parameters fixed,Uk (1 ≤ k ≤ N) can be updated by solvingminUk Lµ(S,M1,· · ·,MN ,U1,· · ·,UN ,P1,· · ·,PN), i.e.:

minUTk Uk=I

‖S×1U1· · ·×NUN −O‖2F . (14)

By employing (10) and the following equation:

B = D ×n V ⇐⇒ B(n) = VD(n), (15)

we can obtain that (14) is equivalent to

maxUTk Uk=I

〈Ak, Uk〉. (16)

where Ak = O(k)

(unfoldk

(S×−kUiNi=1

))T. By using the

von Neumann’s trace inequality [48], we can easily solve (16),and update Uk by the following formula:

U+k = BkCk

T . (17)

where Ak = BkDCTk is the SVD decomposition of Ak.

WithMj(j 6= k) and other parameters fixed,Mk can be

Page 5: Kronecker-Basis-Representation Based Tensor Sparsity and ...mathbigdata.xjtu.edu.cn/TPAMI2018-2.pdf · Transactions on Pattern Analysis and Machine Intelligence IEEE TRANSACTIONS

0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2734888, IEEETransactions on Pattern Analysis and Machine Intelligence

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016 6

Algorithm 1 Algorithm for KBRMInput: observed tensor Y

1: Initialize U (0)1 , · · · , U (0)

N and S(0) by high-order SVD of Y ,M(0)

k = Y , ∀k = 1, 2, · · · , N , l = 1, ρ > 1, µ(0) > 02: while not convergence do3: Update S(l) by (12)4: Update all U (l)

k by (17);5: Update allM(l)

k by (19)6: Update multipliers by P(l)

k = P(l−1)k + µ(l)(L −M(l−1)

k )7: Let µ(l) := ρµ(l−1); l = l + 18: end while

Output: X = S(l) ×1 U(l)1 · · · ×N U

(l)N

updated by solving the following problem:

minMk

akP∗ls

(Mk(k)

)+

1

2‖L+

1

µPk −Mk‖2F , (18)

where ak =(λµ

∏j 6=k P

∗ls

(Mj(j)

))and L = S ×1 U1 · · · ×N

UN . By using Theorem 1 in [43], it’s easy to deduce that (18)has the following closed-form solution:

M+k = foldk

(V1ΣakV

T2

), (19)

where Σak= diag (Dak,ε(σ1),Dak,ε(σ2), · · ·,Dak,ε(σn)) andV1diag(σ1, σ2, ..., σn)V T2 is the SVD of unfoldk

(L+ 1

µPk)

.The proposed algorithm for KBRM can be summarized

in Algorithm 1. Note that the convergence of this ADMMalgorithm is difficult to analyze due to the non-convexity ofour model. By deductions similar to the results of generalADMM algorithms [4], [36], we can obtain the followingweak convergence result1 to facilitate the construction of arational termination condition for Algorithm 1.

Remark 1. For sequences S(l), M(l)k and U (l)

k ,k = 1, 2, · · · , N , generated by Algorithm 1 and X (l) =

S(l) ×1 U(l)1 ×2 · · · ×N U

(l)N , M(l)

k and X (l) satisfy:∥∥∥X (l) −M(l)k

∥∥∥F

= O(µ(0)ρ−l/2

),∥∥∥X (l+1) −X (l)

∥∥∥F

= O(µ(0)ρ−l/2

).

(20)

Note that the remark implies an easy termination s-trategy for our algorithm: Terminate the algorithm whenthe variation between reconstructed tensors in two adjacentsteps is smaller than a preset threshold θ. The theoremnot only helps guarantee that the algorithm can stop inT = O((log(µ(0)/θ))/ log (ρ)) iterations but also implies theequality constraint can be finely approximated in iteration.Similar results can be deduced for other KBR algorithms.

5.2 KBR for tensor completion

Tensor completion (TC) refers to the problem of recoveringa tensor from only partial observations of its entries, whicharises in a number of computer vision and pattern recogni-tion applications, such as hyperspectral data recovery andvideo inpainting, etc [35], [51]. It is a well known ill-posedproblem which needs prior of the ground truth tensor assupplementary information for reconstruction. We can then

1. The proof is presented in http://dymeng.gr.xjtu.edu.cn/8.

utilize the proposed KBR regularizer to encode such priorknowledge as follows:

minX

S(X ) s.t. XΩ = TΩ, (21)

where X , T ∈ RI1×I2×···×IN are the reconstructed andobserved tensors, respectively, and the elements of T in theset Ω are given while the remaining are missing; S(·) is thetensor sparsity measure.

By using the relaxation S?(X ) as defined in (6), we canget the following KBR-TC model:

minX

Pls (S) + λ∏N

j=1P ∗ls(X(j)

)s.t. XΩ = TΩ. (22)

Similar to the KBRM problem, we apply the ADMMto solving (22). Firstly, we need to introduce N auxiliarytensorsMj (j = 1, 2, · · · , N) and equivalently reformulate(22) as:

minS,Mj ,UTj Uj=I

Pls(S) + λ∏N

j=1P ∗ls(Mj(j)

)s.t. X − S ×1 U1· · ·×N UN = 0,

XΩ − TΩ = 0, X −Mj = 0, j = 1, 2, · · · , N.Its augmented Lagrangian function is with the form:

Lµ(S,M1,· · ·,MN , U1,· · ·, UN ,X ,Px,Pt,Pm1 ,· · ·,PmN )

= Pls (S)+λ∏N

j=1P ∗ls(Mj(j)

)+〈X−S×1 U1· · ·×N UN ,Px〉

2‖X−S×1 U1· · ·×N UN‖2F ,+

∑N

j=1〈X −Mj ,Pmj 〉

+∑N

j=1

µ

2‖X −Mj‖2F ,+〈XΩ − TΩ,PtΩ〉+

µ

2‖XΩ − TΩ‖2F ,

where Px, Pt and Pmj , ∀j = 1, 2,· · ·, N are the Lagrangemultipliers, µ is a positive scalar and Uj must satisfyUTj Uj = I , ∀j = 1, 2,· · ·, N . Then we can solve the problemunder the ADMM framework.

With other parameters fixed, S can be updated by solv-ing subproblem similar to (11), which has the followingclosed-form solution:

S+ = Db,ε

((X + µ−1Px)× UT1 · · · ×N UTN

), (23)

where b = 1µ .

When updating Uk (k = 1, 2, · · · , N) with other param-eters fixed, similar to the derivation of (17), we can obtainthat Uk can be updated by

U+k = BkCk

T , (24)

where Ak = BkDCTk is the SVD of Ak, and

Ak = unfoldk(X + µ−1Px

)unfoldk

(S×−kUjNj=1

)T.

When updating Mk (k = 1, 2, · · · , N) with other pa-rameters fixed, similar to the derivation of (19), we canobtain

M+k = foldk

(V1ΣakV

T2

), (25)

where Σak= diag (Dak,ε(σ1),Dak,ε(σ2), · · ·,Dak,ε(σn)) andV1diag(σ1, · · · , σn)V T2 is the SVD of unfoldk

(X+µ−1Pmk

),

and ak =(λµ

∏j 6=k P

∗ls

(Mj(j)

)).

With other parameters fixed, X can be updated by min-imizing the augmented Lagrangian function on X , which is

Page 6: Kronecker-Basis-Representation Based Tensor Sparsity and ...mathbigdata.xjtu.edu.cn/TPAMI2018-2.pdf · Transactions on Pattern Analysis and Machine Intelligence IEEE TRANSACTIONS

0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2734888, IEEETransactions on Pattern Analysis and Machine Intelligence

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016 7

equivalent to solving the following problem:

minX

∥∥∥∥X− 1

(N + 1)µ

(µL−Px+

∑j(µMj−Pmj )

)∥∥∥∥2

F

+1

N + 1

∥∥XΩ + µ−1PtΩ − TΩ

∥∥2

F,

(26)

where L = S×1 U1· · ·×N UN . Its closed-form solution is:X+

Ω⊥=

1

(N + 1)µ

(µL−Px+

∑j(µMj−Pmj )

)Ω⊥

X+Ω =

1

(N + 2)µ

(µL−Px+µT −Pt+

∑j(µMj−Pmj )

(27)

where Ω⊥ defines the complement of Ω.The proposed algorithm for KBR-TC can then be sum-

marized in Algorithm 2.

Algorithm 2 Algorithm for KBR-TCInput: observed tensor T

1: Initialize X (0)Ω = TΩ, X (0)

Ω⊥= mean(TΩ), initialize and S(0)

and U(0)1 , · · · , U (0)

N by high-order SVD of X 0,M(0)k = X (0),

∀k = 1, 2, · · · , N , l = 1, ρ > 1, µ(0) > 02: while not convergence do3: Update S(l) by (23)4: Update all U (l)

k by (24)5: Update allM(l)

k by (25)6: Update X (l) by (27)7: Update multipliers by P(l)

k = P(l−1)k + µ(l)(L −M(l−1)

k )8: Let µ(l) := ρµ(l−1); l = l + 19: end while

Output: X = S(l) ×1 U(l)1 · · · ×N U

(l)N

5.3 KBR for tensor robust PCA

Tensor robust PCA (TRPCA) aims to recover the tensor fromgrossly corrupted observations, i.e.,

T = L+ E , (28)

where L is the main tensor and E corresponds the nois-es/outliers embedded in data. Using the proposed KBRmeasure, we can get the following TRPCA model:

minL,E

S(L) + β‖E‖1, s.t. T = L+ E , (29)

where β is a tuning parameter compromising the recoveredtensor and noise terms.

A more general noise assumption utilized to model theRPCA problem is to consider the noise as a mixture of sparsenoise and Gaussian noise [72]. Correspondingly, we can alsoameliorate the observation expression as:

T = L+ E +N , (30)where N is an additional Gaussian noise embedded intensor. We can then propose the KBR-RPCA model:

minL,E,N

Pls(S)+λ∏N

j=1P ∗ls(L(j)

)+β‖E‖1+

γ

2‖T −L−E‖2F , (31)

where L = S ×1 U1 · · · ×N UN is the Tucker decompositionof L.

We also apply the ADMM to solving (31). Firstly, weneed to introduce N auxiliary tensorsMj (j = 1, 2, · · · , N )

and equivalently reformulate (31) as follows:

minS,Mj ,UTj Uj=I

Pls (S) + λ∏N

j=1P ∗ls(Mj(j)

)+ β‖E‖1

2‖S ×1 U1 · · · ×N UN + E − T ‖2F

s.t. S ×1 U1 · · · ×N UN −Mj = 0,∀j = 1, 2, · · · , N,

(32)

and its augmented Lagrangian function is with the form:Lµ(S,M1,· · ·,MN , U1,· · ·, UN , E,P1, · · ·,PN) = Pls (S) +

λ∏N

j=1P ∗ls(Mj(j)

)+ β‖E‖1+

γ

2‖S ×1 U1 · · · ×N UN + E−T ‖2F

+∑N

j=1〈S×1U1· · ·×NUN −Mj ,Pj〉

+∑N

j=1

µ

2‖S×1U1· · ·×NUN −Mj‖2F ,

where Pjs are the Lagrange multipliers, µ is a positive scalarand Uj satisfies UTj Uj = I , ∀j = 1, 2, · · · , N . Now we cansolve the problem under the ADMM framework.

With other parameters fixed, S can be updated by solv-ing a subproblem similar to (11), which has the followingclosed-form solution:

S+ = Db,ε(Q). (33)

where b = 1γ+Nµ , Q = O ×1 U

T1 · · · ×N UTN , and O =

γ(T −E)+∑j (µMj−Pj)

γ+Nµ .When updating Uk (k = 1, 2, · · · , N) with other param-

eters fixed, similar to the derivation of (17), we can obtainthe updating equation of Uk as

U+k = BkCk

T , (34)

where Ak = BkDCTk is the SVD decomposition of Ak, and

Ak = O(k)

(unfoldk

(S×−kUiNi=1

))T.

When updating Mk (k = 1, 2, · · · , N) with other pa-rameters fixed, similar to the derivation of (19), we canobtain thatMk can be updatedp by

M+k = foldk

(V1ΣakV

T2

), (35)

where Σak = diag (Dak,ε(σ1),Dak,ε(σ2), · · ·,Dak,ε(σn)) ,V1diag(σ1, · · · , σn)V T2 is the SVD of

unfoldk(S ×1 U1· · ·×N UN+µ−1Pk

),

and ak =(λµ

∏j 6=k P

∗ls

(Mj(j)

)).

With other parameters fixed, E can be updated by min-imizing the augmented Lagrangian function on E , which isequivalent to solving the following problem:

minEβ‖E‖1 +

γ

2‖S ×1 U1 · · · ×N UN + E − T ‖2F ,

which has the following closed-form solution [63]:

E+ = S βγ

(T − S ×1 U1 · · · ×N UN ), (36)

where Sτ (·) is the soft thresholding operator [63]:

Sτ (x) =

0 if |x| ≤ τ

sign(x) (|x| − τ) if |x| > τ. (37)

The KBR-RPCA algorithm can then be summarized asAlgorithm 3.

5.4 Applying KBRM algorithm to MSI Denoising

In this section, we briefly introduce how to apply the pro-posed KBRM algorithm to MSI denoising, more details canbe found in our conference paper [69].

Page 7: Kronecker-Basis-Representation Based Tensor Sparsity and ...mathbigdata.xjtu.edu.cn/TPAMI2018-2.pdf · Transactions on Pattern Analysis and Machine Intelligence IEEE TRANSACTIONS

0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2734888, IEEETransactions on Pattern Analysis and Machine Intelligence

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016 8

Algorithm 3 Algorithm for KBR-RPCAInput: observed tensor T

1: Initialize U (0)1 , · · · , U (0)

N and S(0) by high-order SVD of T ,M(0)

k = T , ∀k = 1, 2, · · · , N , E = 0, l = 1, ρ > 1, µ(0) > 02: while not convergence do3: Update S(l) by (33)4: Update all U (l)

k by (34)5: Update allM(l)

k by (35)6: Update E(l) by (36)7: Update multipliers by P(l)

k = P(l−1)k + µ(l)(L −M(l−1)

k )8: Let µ(l) := ρµ(l−1); l = l + 19: end while

Output: L = S(l) ×1 U(l)1 · · · ×N U

(l)N , E and N = T − L − E(l)

FBP Group

Aggregaon

2D FBPs3D FBPs

BlockMatching Unfolding

Folding

Stacking

Perform our model

FBP Group

Fig. 5. Flowchart of the proposed MSI denoising method.

The most significant issue of recovering a clean MSIfrom its corruption is to fully utilize the prior structureknowledge of the to-be-reconstructed MSI. The most com-monly utilized two prior structures for MSI recovery areglobal correlation along spectrum (GCS) and nonlocal self-similarity across space (NSS). The GCS prior indicates thatan MSI contains a large amount of spectral redundancy andthe images obtained across the MSI spectrum are generallyhighly correlated. And the NSS prior refers to the fact thatfor a given local fullband patch (FBP) of an MSI (which isstacked by patches at the same location of MSI over all band-s), there are many FBPs similar to it. It has been extensivelyshown that such two prior knowledge do be possessed bywide range of natural MSIs and be very helpful for variousMSI recovery issues [12], [29], [52], [76].

Albeit demonstrated to be effective to certain MSI de-noising cases, most of the current methods to this task onlytake one such prior knowledge into full consideration. In[69], a new denoising framework as shown in Fig. 5 isproposed. Through block matching of full band patches(FBP), we can obtain a similar FBP group, and by unfoldingeach FBP along spectral dimensionality within the similarFBP group, we can represent each FBP group correlated tothe ith key FBP as a 3-order tensor Xi. Both GCS and NSSknowledge are well preserved and reflected by such rep-resentation, along the spectral and nonlocal-similar-patch-number modes of Xi, respectively. Thus, high-order sparsityof all Xis can well encode both GCS and NSS of the wholeMSI. Then, we can estimate the corresponding true nonlocalsimilarity FBPs Xi from its corruption Yi by solving thefollowing optimization problem:

Xi = arg minX

S?(X ) +γ

2‖Yi −X‖2F , (38)

which is an KBRM problem and can be efficiently solved byAlgorithm 1. By aggregating all reconstructed Xis we canreconstruct the estimated MSI. The entire denoising progressis summarized in Algorithm 4 and visualized in Fig. 5. Wedenote the algorithm as KBR-denoising for convenience.

Algorithm 4 Algorithm for KBR-denoisingInput: Noisy MSI Y

1: Initialize X (0) = Y2: for l = 1 : L do3: Calculate Y(l) = X (l−1) + δ

(Y − X (l−1)

)4: Construct the entire FBP set ΩY(l)

5: Construct the set of similar FBP group set YiKi=1

6: for each FBP groups Yi do7: Solve the problem (38) by Algorithm 18: end for9: Aggregate XiKi=1 to form the clean image X (l)

10: end forOutput: Denoised MSI X (L)

5.5 Computational complexity analysisConsidering Algorithm 1 for an N -order input tensor Y ∈RI1×...×IN , the main per-iteration cost lies in the update ofUj andMj , j = 1, 2, · · · , N . Updating Uj requires comput-ing an SVD of Ij × Ij matrix, and updating Mj requirescomputing an SVDs of Ij × (

∏i6=j Ii) matrix. Meanwhile,

about O((log(µ(0)/θ))/ log (ρ)) iterations will be needed asdeduced in Section 5.1. It is easy to see that Algorithms 2and 3 also possess similar computation complexity.

Note that the per-iteration computing complexity of ouralgorithms is comparable to the rank-sum-based tensor re-covery algorithm [39], [40], which need to perform N SVDsfor matrices in the size of Ij × (

∏i6=j Ii), j = 1, 2, · · · , N , in

each iteration. When N = 3, it’s also comparable to the t-SVD-based TC/TRPCA methods [42], [74], which need tocompute I3 (assuming I3 N = 3) SVDs for I1 × I2matrices each iteration .

For Algorithm 4 with an input MSI Y ∈ RI1×I2×I3 , thenumber of FBP groups is K = O(I1I2), the size of eachFBP group is p2 × s × I3 (in our experiments, it is 36 ×60 × 31), where p is the size of patch and s is number ofFBPs in each similar group. The computation cost seems notvery small for quite large K. However, denoising on the KFPB groups can be processed in parallel, each with relativelysmall computation complexity.

6 EXPERIMENTAL RESULTS

In this section we will test the performance of KBRM,KBR-TC and KBR-RPCA methods, on MSI denoising, MSIcompletion and background subtraction tasks, respectively2.

6.1 MSI denoising experimentsWe first test the KBR-denoising method on MSI denoisingwith synthetic noises for quantitative performance compari-son of all competing methods, and then we test all methodson real MSI data.

2. More experimental demonstrations are listed inhttp://dymeng.gr.xjtu.edu.cn/8. The related Matlab codes of ouralgorithms can be downloaded in http://bit.ly/2j8BVZ2.

Page 8: Kronecker-Basis-Representation Based Tensor Sparsity and ...mathbigdata.xjtu.edu.cn/TPAMI2018-2.pdf · Transactions on Pattern Analysis and Machine Intelligence IEEE TRANSACTIONS

0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2734888, IEEETransactions on Pattern Analysis and Machine Intelligence

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016 10

measures are, the closer the target MSI is to the referenceone. ERGAS measures fidelity of the restored image basedon the weighted sum of MSE in each band. Different fromthe former three measures, the smaller ERGAS is, the betterdoes the target MSI estimate the reference one.

Synthetic noise simulations. The Columbia MSIDatabase [71]11 is utilized in our simulated experiments.This dataset contains 32 real-world scenes of a variety ofreal-world materials and objects, each with spatial resolu-tion 512×512 and spectral resolution 31, which includes fullspectral resolution reflectance data collected from 400nm to700nm in 10nm steps. In our experiments, each MSI is pre-scaled into the interval [0, 1].

Additive Gaussian noises with variance v are added tothese testing MSIs to generate the noisy observations withv ranging from 0.1 to 0.3. There are two parameters λ andβ in our model. The former λ is used to balance two partsin the same order of magnitude, and thus it should neitherbe too small nor too large, and we empirically find that ouralgorithm will achieve satisfactory performance when λ istaken in the range [0.1, 10]. In all of the experiments here, wejust simply choose λ = 10. The parameter β is dependent onv, and we let β = cv−1, where c is set as the constant 10−3.The two algorithm parameters, ρ and µ, are set as 1.05 and250, respectively.

For each noise setting, all of the four PQI values foreach competing MSI denoising methods on all 32 sceneshave been calculated and recorded. Table 1 lists the averageperformance (over different scenes and noise settings) of allmethods. From these quantitative comparison, the advan-tage of the proposed method can be evidently observed.Specifically, our method can significantly outperform othercompeting methods with respective to all the evaluationmeasures, e.g., our method achieves around 1.5 dB gain inPSNR and 10 in ERGAS beyond the second best performedmethod. Such superiority can be more comprehensivelyobserved in more detailed reports as listed in the website:http://dymeng.gr.xjtu.edu.cn/8.

To further depict the denoising performance of ourmethod, we show in Fig. 6 two bands in chart and stuffedtoy that centered at 400nm (the darker one) and 700nm (thebrighter one), respectively. From the figure, it is easy toobserve that the proposed method evidently performs betterthan other competing ones, both in the recovery of finer-grained textures and coarser-grained structures. Especially,when the band energy is low, most competing methodsbegin to fail, while our method still performs consistentlywell in such harder cases.

Real MSI experiments. The HYDICE MSI of naturalscenes12 is used in our experiments. The original MSI isof size 304 × 304 × 210. As the bands 76, 100-115, 130-155 and 201-210 are seriously polluted by atmosphere andwater absorption and provide little useful information, weremove them and leave the remaining test data with size304 × 304 × 157. We then easily pre-scaled the MSI intothe interval [0, 1] for all competing methods. Since the noiselevel is unknown for real noisy images, we use an off-the-shelf noise estimation method [17] to estimate it and utilize

11. http://www1.cs.columbia.edu/CAVE/databases/multispectral12. http://www.tec.army.mil/Hypercube

TABLE 1Performance comparison (mean + variance) of 11 competing methodswith respect to 4 PQIs averaged over all 32 scenes and all extents ofnoises. The best result in each PQI measure is highlighted in bold.

PSNR SSIM FSIM ERGASNosiy image 14.59±3.38 0.06±0.05 0.47±0.15 1151.54±534.17BwK-SVD 27.77±2.01 0.47±0.10 0.81±0.06 234.58±66.73BwBM3D 34.00±3.39 0.86±0.06 0.92±0.03 116.91±42.763DK-SVD 30.31±2.23 0.69±0.06 0.89±0.03 176.58±43.78

LRTA 33.78±3.37 0.82±0.09 0.92±0.03 120.79±46.06PARAFAC 31.35±3.42 0.72±0.12 0.89±0.04 160.66±66.95ANLM3D 34.12±3.19 0.86±0.07 0.93±0.03 117.01±35.79Trace/TV 32.30±3.02 0.82±0.08 0.91±0.03 140.25±44.15

TDL 35.71±3.09 0.87±0.07 0.93±0.04 96.21±34.36BM4D 36.18±3.03 0.86±0.07 0.94±0.03 91.20±29.70t-SVD 35.88±3.10 0.91±0.04 0.96±0.02 93.65 ± 31.68

KBR-denoising 37.71±3.39 0.91±0.05 0.96±0.02 78.21±31.59

TABLE 2Performance comparison of 7 competing TC methods with respect to

RRE on synthetic data with rank setting (30,30,30) (upper) and (10, 35,40) (lower).

Method 20% 25% 30% 35% 40%AlmMC 8.23e-01 7.66e-01 7.12e-01 6.53e-01 5.93e-01HaLRTC 8.95e-01 8.65e-01 8.37e-01 8.06e-01 7.75e-01

Tmac 1.99e-01 4.39e-03 1.41e-04 3.39e-05 2.31e-05t-SVD 9.29e-01 8.79e-01 8.27e-01 7.69e-01 7.09e-01

McpTC 3.79e-01 9.68e-05 7.55e-09 5.12e-09 4.16e-09ScadTC 1.08e-01 1.61e-04 1.22e-05 6.02e-09 4.48e-09KBR-TC 1.49e-01 7.09e-09 5.40e-09 4.18e-09 3.30e-09Method 20% 25% 30% 35% 40%AlmMC 5.67e-01 4.38e-01 3.07e-01 1.75e-01 6.33e-02HaLRTC 8.95e-01 8.66e-01 8.37e-01 8.06e-01 7.74e-01

Tmac 9.06e-01 8.63e-01 8.11e-01 7.37e-01 6.35e-01t-SVD 8.46e-01 7.64e-01 6.69e-01 5.68e-01 4.62e-01

McpTC 5.76e-01 2.10e-01 4.84e-03 3.50e-05 1.68e-07ScadTC 4.96e-01 4.52e-02 4.96e-04 2.15e-05 4.49e-09KBR-TC 2.20e-01 6.60e-09 4.69e-09 3.15e-09 2.32e-09

this knowledge to set parameter β. The parameters λ, ρ, µare set the same as previous section.

We illustrate the experimental results at the first band ofthe test MSI in Fig. 7. It is seen that the band image restoredby our method is capable of better removing the unexpect-ed stripes and Gaussian noise while finely preserving thestructure underlying the MSI, while the results obtained bymost of other competing methods remain large amount ofstripes noises. BM3D and BM4D can perform comparativelybetter in stripes noise removing, but their results containevident blurry area, where our method evidently recoversmore details hiding under the corrupted MSI.

6.2 MSI completion by KBR-TC

In this section, we first test the performance of the KBR-TCmethod in simulations, and then validate its effectiveness inapplication of MSI completion.

Experimental setting. The comparison methods includ-ing: ADMM(ALM)-based matrix completion (MC-ALM)[36], HaLRTC [40], the factorization-based TC method (T-Mac) [70], joint trace/TV based TC method (Trace/TV)[23], tensor-SVD-based TC method (t-SVD) [74], minmaxconcave plus penalty-based TC method (McpTC) [11] andthe smoothly clipped absolute deviation penalty-based TCmethod (ScadTC) [11]. For matrix-based method MC-ALM,we applied it to the unfolded matrix along each mode of the

Page 9: Kronecker-Basis-Representation Based Tensor Sparsity and ...mathbigdata.xjtu.edu.cn/TPAMI2018-2.pdf · Transactions on Pattern Analysis and Machine Intelligence IEEE TRANSACTIONS

0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2734888, IEEETransactions on Pattern Analysis and Machine Intelligence

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016 13

0 0.2 0.4 0.6 0.8 1−20

−15

−10

−5

0

5

20%19%17%

(a)

0 0.2 0.4 0.6 0.8 1−20

−15

−10

−5

0

5

19%18%17%

LRRE

LRRE

t(b)

t

Fig. 10. Performance variation of the proposed method in terms of theLRRE on different t and different sampling rate. (a) Ranks along eachmode is (10, 30, 30). (b) Ranks along each mode is (25, 25, 25).

the proposed tensor sparsity measure. Specifically, throughusing this measure, the proposed methods outperform thestate-of-the-art methods specifically designed on these tasks.

In our future research, we will more deeply discover thetheoretical properties under this measure (e.g., under whichcondition the measure can help get an exact recovery from acorrupted/sampled tensor) and try to utilize it to amelioratemore tensor analysis applications. Besides, we will furtherinvestigate the efficiency speedup and better initializationissues for our algorithm.

REFERENCES

[1] M. Aharon. K-SVD: An algorithm for designing overcomplete dic-tionaries for sparse representation. IEEE Trans. Signal Processing,54(11):4311–4322, 2006.

[2] S. D. Babacan, M. Luessi, R. Molina, and A. K. Katsaggelos. Sparsebayesian methods for low-rank matrix estimation. IEEE Trans.Signal Processing, 60(8):3964–3977, 2012.

[3] Y. Benezeth, P.-M. Jodoin, B. Emile, H. Laurent, and C. Rosen-berger. Comparative study of background subtraction algorithms.Journal of Electronic Imaging, 19(3):033003–033003, 2010.

[4] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributedoptimization and statistical learning via the alternating directionmethod of multipliers. Foundations & Trends? in Machine Learning,3(1):1–122, 2011.

[5] A. M. Buchanan and A. W. Fitzgibbon. Damped newton algo-rithms for matrix factorization with missing data. In CVPR, pages316–322, 2005.

[6] E. J. Candes, X. Li, Y. Ma, and J. Wright. Robust principalcomponent analysis? Journal of the ACM (JACM), 58(3):11, 2011.

[7] E. J. Candes and Y. Plan. Matrix completion with noise. Proceedingsof the IEEE, 98(6):925–936, 2010.

[8] E. J. Candes and B. Recht. Exact matrix completion via convexoptimization. Foundations of Computational mathematics, 9(6):717–772, 2009.

[9] E. J. Candes, M. B. Wakin, and S. P. Boyd. Enhancing sparsityby reweighted l1 minimization. Journal of Fourier analysis andapplications, 14(5-6):877–905, 2008.

[10] W. Cao, Y. Wang, J. Sun, D. Meng, C. Yang, A. Cichocki, and Z. Xu.A novel tensor robust pca approach for background subtractionfrom compressive measurements. arXiv preprint arXiv:1503.01868,2015.

[11] W. Cao, Y. Wang, C. Yang, X. Chang, Z. Han, and Z. Xu. Folded-concave penalization approaches to tensor completion. Neurocom-puting, 152:261–273, 2015.

[12] A. A. Chen. The inpainting of hyperspectral images: A survey andadaptation to hyperspectral data. In SPIE, 2012.

[13] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denois-ing by sparse 3-d transform-domain collaborative filtering. IEEETrans. Image Processing, 16(8):2080–2095, 2007.

[14] F. De La Torre and M. J. Black. A framework for robust subspacelearning. International Journal of Computer Vision, 54(1-3):117–142,2003.

[15] X. Ding, L. He, and L. Carin. Bayesian robust principal componentanalysis. IEEE Trans. Image Processing, 20(12):3419–3430, 2011.

[16] W. Dong, G. Shi, and X. Li. Nonlocal image restoration with bilat-eral variance estimation: a low-rank approach. IEEE transactionson image processing, 22(2):700–711, 2013.

[17] D. L. Donoho. De-noising by soft-thresholding. IEEE Trans.Information Theory, 41(3):613–627, 1995.

[18] M. Elad and M. Aharon. Image denoising via sparse and redun-dant representations over learned dictionaries. IEEE Trans. ImageProcessing, 15(12):3736–45, 2006.

[19] A. Eriksson and A. Van Den Hengel. Efficient computation ofrobust low-rank matrix approximations in the presence of missingdata using the l 1 norm. 2010.

[20] F. Fouss, A. Pirotte, J.-M. Renders, and M. Saerens. Random-walkcomputation of similarities between nodes of a graph with appli-cation to collaborative recommendation. IEEE Trans. Knowledgeand data engineering, 19(3):355–369, 2007.

[21] S. Gandy, B. Recht, and I. Yamada. Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Problems,27(2):025010, 2011.

[22] M. Golbabaee and P. Vandergheynst. Hyperspectral image com-pressed sensing via low-rank and joint-sparse matrix recovery. InIEEE International Conference on Acoustics, pages 2741–2744, 2012.

[23] M. Golbabaee and P. Vandergheynst. Joint trace/tv norm min-imization: A new efficient approach for spectral compressiveimaging. pages 933–936, 2012.

[24] D. Goldfarb and Z. Qin. Robust low-rank tensor recovery: Modelsand algorithms. SIAM Journal on Matrix Analysis and Applications,35(1):225–253, 2014.

[25] P. Gong, C. Zhang, Z. Lu, J. Z. Huang, and J. Ye. A general iterativeshrinkage and thresholding algorithm for non-convex regularizedoptimization problems. In ICML, 2013.

[26] S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclear normminimization with application to image denoising. In CVPR, 2014.

[27] W. He, H. Zhang, L. Zhang, and H. Shen. Total-variation-regularized low-rank matrix factorization for hyperspectral imagerestoration. IEEE Transactions on Geoscience and Remote Sensing,54(1):178–188, 2016.

[28] Y. Hu, D. Zhang, J. Ye, X. Li, and X. He. Fast and accurate matrixcompletion via truncated nuclear norm regularization. IEEE Trans.Pattern Analysis and Machine Intelligence, 35(9):2117–2130, 2013.

[29] R. Kawakami, J. Wright, Y.-W. Tai, Y. Matsushita, M. Ben-Ezra,and K. Ikeuchi. High-resolution hyperspectral imaging via matrixfactorization. In CVPR, 2011.

[30] Q. Ke and T. Kanade. Robust l1 norm factorization in the presenceof outliers and missing data by alternative convex programming.In CVPR, pages 739–746, 2005.

[31] T. G. Kolda and B. W. Bader. Tensor decompositions and applica-tions. SIAM Review, 51(3):455–500, 2009.

[32] T. G. Kolda, B. W. Bader, and J. P. Kenny. Higher-order web linkanalysis using multilinear algebra. In Data Mining, Fifth IEEEInternational Conference on, pages 8–pp. IEEE, 2005.

[33] L. D. Lathauwer, B. D. Moor, and J. Vandewalle. A multilinearsingular value decomposition. Siam Journal on Matrix Analysis &Applications, 21(4):1253–1278, 2000.

[34] L. Li, W. Huang, I. Y.-H. Gu, and Q. Tian. Statistical modeling ofcomplex backgrounds for foreground object detection. IEEE Trans.Image Processing, 13(11):1459–1472, 2004.

[35] N. Li and B. Li. Tensor completion for on-board compression ofhyperspectral images. In ICIP, pages 517–520. IEEE, 2010.

[36] Z. Lin, M. Chen, and Y. Ma. The augmented lagrange multipliermethod for exact recovery of corrupted low-rank matrices. EprintArxiv, 9, 2010.

[37] Z. Lin, A. Ganesh, J. Wright, L. Wu, M. Chen, and Y. Ma. Fastconvex optimization algorithms for exact recovery of a corruptedlow-rank matrix. Computational Advances in Multi-Sensor AdaptiveProcessing (CAMSAP), 61, 2009.

[38] Z. Lin, R. Liu, and Z. Su. Linearized alternating direction methodwith adaptive penalty for low-rank representation. Advances inNeural Information Processing Systems, pages 612–620, 2011.

[39] J. Liu, P. Musialski, P. Wonka, and J. Ye. Tensor completion forestimating missing values in visual data. In ICCV, 2009.

[40] J. Liu, P. Musialski, P. Wonka, and J. Ye. Tensor completion forestimating missing values in visual data. IEEE Trans. PatternAnalysis & Machine Intelligence, 35(1):208–220, 2013.

[41] X. Liu, S. Bourennane, and C. Fossati. Denoising of hyperspectralimages using the parafac model and statistical performance anal-ysis. IEEE Trans. Geoscience and Remote Sensing, 50(10):3717–3724,2012.

[42] C. Lu, J. Feng, Y. Chen, W. Liu, Z. Lin, and S. Yan. Tensor robustprincipal component analysis: Exact recovery of corrupted low-rank tensors via convex optimization. In The IEEE Conference onComputer Vision and Pattern Recognition (CVPR), June 2016.

[43] C. Lu, C. Zhu, C. Xu, S. Yan, and Z. Lin. Generalized singularvalue thresholding. In AAAI, 2015.

Page 10: Kronecker-Basis-Representation Based Tensor Sparsity and ...mathbigdata.xjtu.edu.cn/TPAMI2018-2.pdf · Transactions on Pattern Analysis and Machine Intelligence IEEE TRANSACTIONS

0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2734888, IEEETransactions on Pattern Analysis and Machine Intelligence

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016 14

[44] M. Maggioni and A. Foi. Nonlocal transform-domain denoisingof volumetric data with groupwise adaptive variance estimation.In SPIE, 2012.

[45] M. Maggioni, V. Katkovnik, K. Egiazarian, and A. Foi. A non-local transform-domain filter for volumetric data denoising andreconstruction. IEEE Trans. Image Processing, 22(1):119 – 133, 2012.

[46] J. V. Manjon, P. Coupe, L. Martı-Bonmatı, D. L. Collins, andM. Robles. Adaptive non-local means denoising of mr imageswith spatially varying noise levels. Journal of Magnetic ResonanceImaging, 31(1):192–203, 2010.

[47] D. Meng and F. Torre. Robust matrix factorization with unknownnoise. In ICCV, pages 1337–1344, 2013.

[48] L. Mirsky. A trace inequality of john von neumann. Monatsheftefur Mathematik, 79(4):303–306, 1975.

[49] S. B. N. Renard and J. Blanc-Talon. Denoising and dimensionalityreduction using multilinear tools for hyperspectral images. IEEETrans. Geoscience and Remote Sensing, 5(2):138–142, 2008.

[50] F. Parvaresh, H. Vikalo, S. Misra, and B. Hassibi. Recoveringsparse signals using sparse measurement matrices in compresseddna microarrays. IEEE Journal of Selected Topics in Signal Processing,2(3):275–285, 2008.

[51] K. A. Patwardhan, G. Sapiro, and M. Bertalmıo. Video inpaintingunder constrained camera motion. IEEE Trans. Image Processing,16(2):545–553, 2007.

[52] Y. Peng, D. Meng, Z. Xu, C. Gao, Y. Yang, and B. Zhang. De-composable nonlocal tensor dictionary learning for multispectralimage denoising. In CVPR, 2014.

[53] H. Rauhut, R. Schneider, and zeljka Stojanac. Low rank tensorrecovery via iterative hard thresholding. Linear Algebra & ItsApplications, 2017.

[54] S. Raychaudhuri, J. M. Stuart, and R. B. Altman. Principal compo-nents analysis to summarize microarray experiments: applicationto sporulation time series. In Pacific Symposium on Biocomputing.Pacific Symposium on Biocomputing, page 455. NIH Public Access,2000.

[55] E. Richard, F. Bach, and J. P. Vert. Intersecting singularities formulti-structured estimation. In ICML, pages 1157–1165, 2013.

[56] E. Richard, P.-A. Savalle, and N. Vayatis. Estimation of simultane-ously sparse and low rank matrices. arXiv preprint arXiv:1206.6474,2012.

[57] B. Romera-Paredes and M. Pontil. A new convex relaxation fortensor completion. In NIPS, 2013.

[58] M. Rup, L. K. Hansen, and S. M. Arnfred. Algorithms for sparsenonnegative tucker decompositions. MIT Press, 2008.

[59] A. C. Sauve, A. O. Hero, W. L. Rogers, S. J. Wilderman, and N. H.Clinthorne. 3d image reconstruction for a compton spect cameramodel. IEEE Trans. Nuclear Science, 46(6):2075–2084, 1999.

[60] A. Shashua. On photometric issues in 3d visual recognition froma single 2d image. International Journal of Computer Vision, 21(1-2):99–122, 1997.

[61] N. Srebro and T. Jaakkola. Weighted low-rank approximations. InICML, pages 720–727, 2003.

[62] O. Taheri, S. Vorobyov, et al. Sparse channel estimation with l p-norm and reweighted l 1-norm penalized least mean squares. InICASSP, 2011.

[63] R. Tibshirani. Regression shrinkage and selection via the lasso: Aretrospective. Journal of the Royal Statistical Society, 73(3):273–282,2011.

[64] L. R. Tucker. Some mathematical notes on three-mode factoranalysis. Psychometrika, 31(3):279–311, 1966.

[65] L. Wald. Data Fusion: Definitions and Architectures: Fusion of Imagesof Different Spatial Resolutions. Presses des lEcole MINES, 2002.

[66] H. Wang, F. Nie, and H. Huang. Low-rank tensor completion withspatio-temporal consistency. In AAAI, 2014.

[67] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Imagequality assessment: from error visibility to structural similarity.IEEE Trans. Image Processing, 13(4):600–612, 2004.

[68] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma. Robust prin-cipal component analysis: Exact recovery of corrupted low-rankmatrices via convex optimization. In NIPS, 2009.

[69] Q. Xie, Q. Zhao, D. Meng, Z. Xu, S. Gu, W. Zuo, and L. Zhang.Multispectral images denoising by intrinsic tensor sparsity regu-larization. In CVPR, 2016.

[70] Y. Xu, R. Hao, W. Yin, and Z. Su. Parallel matrix factorization forlow-rank tensor completion. Inverse Problems & Imaging, 9(2), 2013.

[71] F. Yasuma, T. Mitsunaga, D. Iso, and S. K. Nayar. Generalizedassorted pixel camera: postcapture control of resolution, dynamicrange, and spectrum. IEEE Trans. Image Processing, 19(9):2241–2253, 2010.

[72] H. Zhang, W. He, L. Zhang, H. Shen, and Q. Yuan. Hyperspectralimage restoration using low-rank matrix recovery. IEEE Trans.

Geoscience & Remote Sensing, 52(8):4729–4743, 2014.[73] L. Zhang, L. Zhang, X. Mou, and D. Zhang. Fsim: a feature

similarity index for image quality assessment. IEEE Trans. ImageProcessing, 20(8):2378–2386, 2011.

[74] Z. Zhang, G. Ely, S. Aeron, N. Hao, and M. Kilmer. Novel methodsfor multilinear data completion and de-noising based on tensor-svd. In CVPR, 2014.

[75] Q. Zhao, D. Meng, Z. Xu, W. Zuo, and L. Zhang. Robust principalcomponent analysis with complex noise. In ICML, pages 55–63,2014.

[76] X. Zhao, F. Wang, T. Huang, M. K. Ng, and R. J. Plemmons.Deblurring and sparse unmixing for hyperspectral images. IEEETrans. Geoscience and Remote Sensing, 51(7):4045–4058, 2013.

Qi Xie received the B.Sc. degree from Xian Jiao-tong University, Xian, China, in 2013, where he iscurrently pursuing the Ph.D. degree. His currentresearch interests include low-rank matrix fac-torization, tensor recovery, and sparse machinelearning methods.

Qian Zhao received the B.Sc. and Ph.D de-grees from Xi’an Jiaotong University, Xi’an, Chi-na, in 2009 and 2015, respectively. He was aVisiting Scholar with Carnegie Mellon University,Pittsburgh, PA, USA, from 2014 to 2015. He iscurrently a Lecturer with School of Mathemat-ics and Statistics, Xi’an Jiaotong University. Hiscurrent research interests include low-rank ma-trix/tensor analysis, Bayesian modeling and self-paced learning.

Deyu Meng received the B.Sc., M.Sc., andPh.D. degrees from Xian Jiaotong University, Xi-an, China, in 2001, 2004, and 2008, respectively.He was a Visiting Scholar with Carnegie MellonUniversity, Pittsburgh, PA, USA, from 2012 to2014. He is currently a Professor with the Insti-tute for Information and System Sciences, XianJiaotong University. His current research inter-ests include self-paced learning, noise modelingand tensor sparsity.

Zongben Xu received the Ph.D. degree in math-ematics from Xian Jiaotong University, Xian, Chi-na, in 1987. He currently serves as the Academi-cian of the Chinese Academy of Sciences, theChief Scientist of the National Basic ResearchProgram of China (973 Project), and the Directorof the Institute for Information and System Sci-ences with Xian Jiaotong University. His currentresearch interests include nonlinear functionalanalysis and intelligent information processing.He was a recipient of the National Natural Sci-

ence Award of China in 2007 and the winner of the CSIAM Su BuchinApplied Mathematics Prize in 2008. He delivered a talk at the Interna-tional Congress of Mathematicians in 2010.