8
Decomposable Nonlocal Tensor Dictionary Learning for Multispectral Image Denoising Yi Peng 1 , Deyu Meng 1 , Zongben Xu 1 , Chenqiang Gao 2 , Yi Yang 3 , Biao Zhang 1 1 Xi’an Jiaotong University; 2 Carnegie Mellon University; 3 The University of Queensland {weichiche}@gmail.com, {dymeng, zbxu}@mail.xjtu.edu.cn, {gaochenqiang, yee.i.yang, megatronbiao}@gmail.com Abstract As compared to the conventional RGB or gray-scale im- ages, multispectral images (MSI) can deliver more faith- ful representation for real scenes, and enhance the perfor- mance of many computer vision tasks. In practice, however, an MSI is always corrupted by various noises. In this paper we propose an effective MSI denoising approach by combi- natorially considering two intrinsic characteristics under- lying an MSI: the nonlocal similarity over space and the global correlation across spectrum. In specific, by explic- itly considering spatial self-similarity of an MSI we con- struct a nonlocal tensor dictionary learning model with a group-block-sparsity constraint, which makes similar full- band patches (FBP) share the same atoms from the spa- tial and spectral dictionaries. Furthermore, through ex- ploiting spectral correlation of an MSI and assuming over- redundancy of dictionaries, the constrained nonlocal MSI dictionary learning model can be decomposed into a series of unconstrained low-rank tensor approximation problem- s, which can be readily solved by off-the-shelf higher order statistics. Experimental results show that our method out- performs all state-of-the-art MSI denoising methods under comprehensive quantitative performance measures. 1. Introduction The radiance of a real scene is distributed across a wide range of spectral bands. A multispectral image (MSI) con- sists of multiple intensities that represent the integrals of the radiance captured by sensors over various discrete bands. For example, conventional RGB images are achieved by in- tegrating the product of the intensity at three typical band intervals. As compared with the traditional image system, MSI helps to deliver more faithful representation for real scenes, and has been shown to greatly enhance the perfor- mance of various computer vision tasks, such as inpainting (b) Spectral Glocal Correlation (a) Spatial Nonlocal Similarity Multispectral Image Figure 1. (a) A collection of similar local patches over the spatial dimensions of the multispectral image (middle). (b) The highly correlated images obtained across the entire spectral dimension of this multispectral image. In real cases, however, an MSI is always corrupted by some noises that are generally conducted by equipment lim- itations like sensor sensitivity, photon effects and calibra- tion error [13, 2]. Besides, since the radiance energy is lim- ited and sometimes the band width is fairly narrow, the en- ergy captured by each sensor might be low. The shot noise and thermal noise then happen inevitably. The denoising problem for MSI is thus still of acute and growing impor- tance [14, 23, 10]. In this paper, we propose a novel tensor dictionary learn- ing model for the task of MSI denoising by combination- ally considering two characteristics of MSI into a single framework: nonlocal similarity in space and global corre- lation in spectrum. On one hand, a typical natural scene contains a collection of similar local patches all over the space, composing of homologous aggregation of micro- structures. By averaging among these nonlocally similar patches, the spatial noise is expected to be prominently al- leviated [22, 18, 15]. On the other hand, an MSI contains a large amount of spectral redundancy [23]. That is, images obtained over different bands are always highly correlated. Through extracting the major components from these glob- ally correlated spectrum information, the spectral MSI noise (the minor components) is expected to be eliminated. Both characteristics can be easily understood by seeing Fig. 1. In our model, we employ a grouped sparsity regularizer to im- pose similar MSI patches to share the same dictionary atoms in their sparse decomposition to implicitly average out the 1

Decomposable Nonlocal Tensor Dictionary Learning for ......sis [7]. The advantage of both methods is that they took the correlation between MSI images over different bands into consideration,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Decomposable Nonlocal Tensor Dictionary Learning forMultispectral Image Denoising

    Yi Peng1, Deyu Meng1, Zongben Xu1, Chenqiang Gao2, Yi Yang3, Biao Zhang11Xi’an Jiaotong University; 2Carnegie Mellon University;3The University of Queensland

    {weichiche}@gmail.com, {dymeng, zbxu}@mail.xjtu.edu.cn,

    {gaochenqiang, yee.i.yang, megatronbiao}@gmail.com

    Abstract

    As compared to the conventional RGB or gray-scale im-ages, multispectral images (MSI) can deliver more faith-ful representation for real scenes, and enhance the perfor-mance of many computer vision tasks. In practice, however,an MSI is always corrupted by various noises. In this paperwe propose an effective MSI denoising approach by combi-natorially considering two intrinsic characteristics under-lying an MSI: the nonlocal similarity over space and theglobal correlation across spectrum. In specific, by explic-itly considering spatial self-similarity of an MSI we con-struct a nonlocal tensor dictionary learning model with agroup-block-sparsity constraint, which makes similar full-band patches (FBP) share the same atoms from the spa-tial and spectral dictionaries. Furthermore, through ex-ploiting spectral correlation of an MSI and assuming over-redundancy of dictionaries, the constrained nonlocal MSIdictionary learning model can be decomposed into a seriesof unconstrained low-rank tensor approximation problem-s, which can be readily solved by off-the-shelf higher orderstatistics. Experimental results show that our method out-performs all state-of-the-art MSI denoising methods undercomprehensive quantitative performance measures.

    1. Introduction

    The radiance of a real scene is distributed across a widerange of spectral bands. A multispectral image (MSI) con-sists of multiple intensities that represent the integralsof theradiance captured by sensors over various discrete bands.For example, conventional RGB images are achieved by in-tegrating the product of the intensity at three typical bandintervals. As compared with the traditional image system,MSI helps to deliver more faithful representation for realscenes, and has been shown to greatly enhance the perfor-mance of various computer vision tasks, such as inpainting[8], superresolution [12] and tracking [21].

    (b) Spectral Glocal Correlation(a) Spatial Nonlocal Similarity Multispectral Image

    Figure 1. (a) A collection of similar local patches over the spatialdimensions of the multispectral image (middle). (b) The highlycorrelated images obtained across the entire spectral dimension ofthis multispectral image.

    In real cases, however, an MSI is always corrupted bysome noises that are generally conducted by equipment lim-itations like sensor sensitivity, photon effects and calibra-tion error [13, 2]. Besides, since the radiance energy is lim-ited and sometimes the band width is fairly narrow, the en-ergy captured by each sensor might be low. The shot noiseand thermal noise then happen inevitably. The denoisingproblem for MSI is thus still of acute and growing impor-tance [14, 23, 10].

    In this paper, we propose a novel tensor dictionary learn-ing model for the task of MSI denoising by combination-ally considering two characteristics of MSI into a singleframework: nonlocal similarity in space and global corre-lation in spectrum. On one hand, a typical natural scenecontains a collection of similar local patches all over thespace, composing of homologous aggregation of micro-structures. By averaging among these nonlocally similarpatches, the spatial noise is expected to be prominently al-leviated [22, 18, 15]. On the other hand, an MSI contains alarge amount of spectral redundancy [23]. That is, imagesobtained over different bands are always highly correlated.Through extracting the major components from these glob-ally correlated spectrum information, the spectral MSI noise(the minor components) is expected to be eliminated. Bothcharacteristics can be easily understood by seeing Fig. 1. Inour model, we employ a grouped sparsity regularizer to im-pose similar MSI patches to share the same dictionary atomsin their sparse decomposition to implicitly average out the

    1

  • noise among these patches. Furthermore, by assuming re-dundant dictionaries over both the space and spectrum, theproposed tensor dictionary learning model can be readilydecomposed into a series of low-rank tensor approximationproblems. Each of these problems corresponds to a spectraldimensionality reduction model conducted by the spectralcorrelation property of MSIs, and can be easily solved bysome off-the-shelf higher order statistics. The spectral re-dundancy problem can thus be alleviated.

    Throughout the paper, we denote scalars, vectors, ma-trices and tensors by the non-bold letters, bold lower caseletters, bold upper case letters and calligraphic upper caseletters, respectively.

    2. Notions and Preliminaries

    We first introduce some necessary notions and prelimi-naries as follows.

    A tensor of orderN , which corresponds to anN -dimensional data array, is denoted asA ∈ RI1×···×In×···IN .Elements ofA are denoted asai1···in···iN , where1 ≤ in ≤In. The mode-n vectors of anN th order tensorA aretheIn dimensional vectors obtained fromA by varying in-dex in while keeping the other indices fixed. The matrixA(n) ∈ RIn×(I1···In−1In+1···IN ) is composed by taking themode-n vectors ofA as its columns. This matrix can also benaturally seen as the mode-n flattening of the tensorA. Then-rank ofA, denoted asrn, is the dimension of the vectorspace spanned by the mode-n vectors ofA.

    The product of two matrices can be generalized to theproduct of a tensor and a matrix. The mode-n product ofa tensorA ∈ RI1×···×In×···IN by a matrixB ∈ RJn×In ,denoted byA ×n B, is also anN th order tensorC ∈R

    I1×···×Jn×···IN , whose entries are computed by

    ci1···in−1jnin+1···iN =∑

    inai1···in−1inin+1···iN bjnin .

    The mode-n product C = A ×n B can also be calcu-lated by the matrix multiplicationC(n) = BA(n), fol-lowed by a re-tensorization of undoing the mode-n flat-tening. The Frobenius norm of a tensorA is defined as:

    ‖A‖F =(∑

    i1,··· ,iN|ai1···iN |

    2)1/2

    . In the following, we

    shortly write‖A‖F as‖A‖.

    3. Related Work

    There are mainly two approaches for MSI denoising, in-cluding the2D extended approach and the tensor-based ap-proach.

    2D extended approach: As one of the classical prob-lems in computer vision,2D image denoising has beenaddressed for more than50 years and a large amount ofresearches have been proposed on this problem, such asNLM [4], K-SVD [20] and BM3D [9]. These methods can

    be directly applied to MSI denoising by treating the im-ages located at different bands separately. This extension,however, neglects the intrinsic properties of MSIs and gen-erally cannot attain good performance in real applications.Another more reasonable extension is specifically designedfor the patch-based image denoising methods, which takesthe small local patches of the image into consideration. Bybuilding small3D cubes of an MSI instead of2D patches ofa traditional image, the corresponding3D-cube-based MSIdenoising algorithm can then be constructed [22]. The state-of-the-art of3D-cube-based approach is represented by theBM4D method [15, 16], which exploits the3D non-localsimilarity of MSI to remove noise in similar MSI3D cubescollaborately. These methods, however, have not taken intoaccount the high correlation across MSI spectrum, and thusstill have much room for improvement.

    Tensor-based approach: An MSI is composed by a s-tack of 2D images, which can be naturally regarded as a3rd-order tensor. The tensor-based approach implements the M-SI denoising by applying the tensor factorization techniquesto the MSI tensor. As a special case of multiway filtering,tensor factorization can be seen as an extension of the tra-ditional singular value decomposition (SVD). The state-of-the-art along this line of research is represented by two ap-proaches. Renard et al. [23] presented a low-rank tensor ap-proximation (LRTA) method by employing the Tucker fac-torization [24] method to obtain the low-rank approxima-tion of the input MSI. Very recently, Liu et al. [14] designedthe PARAFAC method by utilizing the parallel factor analy-sis [7]. The advantage of both methods is that they took thecorrelation between MSI images over different bands intoconsideration, and tried to eliminate the spectral redundan-cy of MSIs. However, they have not utilized the nonlocalsimilarity property of MSI, and their performance may besensitive to noise extents and types.

    4. Decomposable Nonlocal MSI DictionaryLearning

    In this section, we first introduce the tensor dictionarylearning (DL) model, and then present the main idea of ourdecomposable nonlocal MSI DL model and the related al-gorithm. The parameter setting problems are also discussedthereafter.

    4.1. From Image DL to MSI DL

    We first briefly introduce the traditional DL model forimage restoration. For a set of image patches (ordered lex-icographically as column vectors){xi}ni=1 ⊂ R

    d, wheredis the dimensionality andn is the number of image patches,DL aims to calculate the dictionaryD = [d1, · · · , dm] ∈R

    d×m, composed by a collection of atomsdi (m > d, im-plying that the dictionary is redundant), and the coefficient

  • matrix Z = [z1, · · · , zn] ∈ Rm×n, composed by the repre-sentation coefficientszi of xi, by the following optimizationmodel [1]:

    minD,z1,··· ,zn

    ∑ni=1

    ‖xi − Dzi‖ s.t. P(zi) ≤ k, (1)

    whereP(·) denotes certain sparsity controlling operatorsuch as thel0 or l1 norm.

    The similar dictionary learning model can be easily ex-tended to MSI cases. First we construct MSI patches likethe image case as follows. An MSI withdW × dH spa-tial resolution (dW , dH denote the spatial width and heightof the MSI, respectively) anddS spectral bands can be ex-pressed as a3rd order tensorH ∈ RdW×dH×dS with two s-patial modes and one spectral mode. By sweeping all acrossthe MSI with overlaps, we can build a group of3D full-band patches (FBP){Pi,j}1≤i≤dW−dw+1,1≤j≤dH−dh+1 ⊂R

    dw×dh×dS (dw < dW , dh < dH ) from the MSI. For sim-plicity, we reformulate all FBPs as a group of3D patches{Xi}

    ni=1, wheren = (dW − dw +1)(dH − dh +1) denotes

    the patch number. Each FBP so constructed contains localspatial while global spectral dimensionality, which can eas-ily help us to consider the two important properties underly-ing an MSI: the nonlocal similarity between spatial patchesand the global correlation across all bands.

    Based on this FBP set{Xi}ni=1, the MSI DL modelcan then be constructed to calculate the spatial and spec-tral dictionaries{DW ∈ Rdw×mW ,DH ∈ Rdh×mH ,DS ∈R

    dS×mS} with mW > dw, mH > dh andmS > dS , im-plying the redundancy of these dictionaries, as follows:

    minDW ,DH ,DS,Zi

    n∑i=1

    ∥∥Xi −Zi ×1 DW ×2 DH ×3 DS∥∥

    s.t., P(Zi) ≤ k , (2)

    whereZi ∈ RmW×mH×mS corresponds to the coefficienttensor forXi which governs the affiliated interaction be-tween the dictionaries, andP(·) denotes the sparsity regu-larization term likel0 or l1 operator [31].

    4.2. From Image Group-Sparsity to MSI Group-Block-Sparsity

    DL has been effectively applied to image denoising byconsidering the nonlocal similarity property of images [17].The basic idea is to firstly group the similar patches intoclustersX(k) = {xik

    j}nkj=1, k = 1, 2, · · · ,K, whereK is the

    cluster number,nk is the patch number in thekth cluster andikj denotes the index of thej

    th patch in thekth cluster, andthen to encourage each cluster share similar atoms in thedictionary. Let’s denote the coefficient matrix correspond-ing to thekth clusterX(k) asZ(k) = [zik1 , zik2 , · · · , ziknk

    ] ∈

    Rm×nk , and this simultaneous-sparse-coding aim can then

    be achieved by applying to (1) the following group-sparsityregularizer on eachZ(k) [17]:

    ‖Z(k)‖p,q =∑m

    i=1‖ẑki ‖

    pq , (3)

    wherêzki denotes theith row vector ofZ(k). The pair(p, q)

    is usually set as(1, 2) or (0,∞). Such group-sparsity reg-ularizer helps to impose some all-zero rows ofZ(k), as de-picted in Fig. 2.

    This nonlocal method can be easily extended to MSI cas-es as follows. First, we group the similar FBPs into clustersdenoted by{Xik

    j}nkj=1 (k = 1, 2, · · · ,K), whereK is the

    cluster number,nk is the FBP number in thekth cluster andikj denotes the index of thej

    th patch in thekth cluster. Andthen we attempt to enforce each cluster share the similaratoms in each of the spatial dictionariesDW , DH and spec-tral dictionaryDS . For convenience we combine the FBPsamples in thekth cluster together to formulate a4th ordertensor:X (k) ∈ Rdw×dh×dS×nk , whose supplemental4th

    mode corresponds to the FBPs located at different spatialpositions of the MSI. Analogously, we align all coefficien-t tensors{Zik

    j}nkj=1 corresponding to thek

    th FBP cluster

    to form Z(k) ∈ RmW×mH×mS×nk . Then the aim of thenonlocal MSI tensor DL can be attained by the followinggroup-block-sparsity regularizer.

    Definition 1 (Group-block-sparsity) For a coefficient ten-sor Z ∈ RmW×mH×mS×n, its group-block-sparsity withrespect to the spatial and spectral modes is‖Z‖B =(rW , rH , rS) if and only if the smallest index subsetsIW , IH , IS satisfyingzi1i2i3i4 = 0 for all (i1, i2, i3) /∈IW × IH × IS containrW , rH , rS elements, respectively.Sub(Z) ∈ Rr

    W×rH×rS×n denotes the intrinsic sub-tensorofZ extracted from the entries of the three dimensions ofZspecified by the index setsIW , IH , IS , respectively.

    The above definition can be easily understood by seeingFig. 2. Note that the group-sparsity [17] can be seen asthe degenerated case of the group-block-sparsity in 2D im-ages. Furthermore, when we setn = 1 (meaning only oneFBP in a cluster), the group-block-sparsity so defined ex-actly corresponds to the concept of block sparsity proposedin [6], which has been substantiated to be capable of en-hancing better recovery of the original high order signals s-ince it implicitly incorporates valuable prior information onreal signals and facilitates making full use of the dictionaryatoms of each mode in signal representation.

    Then we can construct the following nonlocal MSI DLmodel:

    minDW ,DH ,DS ,Z(k)

    K∑

    k=1

    ∥X (k) − Z(k) ×1 D

    W ×2 DH ×3 D

    S∥

    s.t., ‖Z(k)‖B � (rWk , r

    Hk , r

    Sk ) , (4)

  • Group KGroup 1

    Group 1

    Group K

    Figure 2. Upper: The image group-sparsity model. In each group the coefficient vectorsZ(k) (k = 1, · · · ,K) share the same atoms of thedictionaryD. Lower: The MSI group-block-sparsity model. In each group the coefficient tensorsZ(k) (k = 1, · · · ,K) share the sameatoms of the spatial dictionariesDW , DH and spectral dictionaryDS .

    wherev1 � v2 denotes that each entry ofv1 is no more thanthe corresponding entry ofv2. The group-block-sparsity ofZ(k) guarantees that each clusterX (k) sharesrWk , r

    Hk , r

    Sk

    atoms of the dictionariesDW ,DH ,DS , respectively, andthus the nonlocal similarity among these cluster samplescan then be implied.

    There are two remaining problems in the constructionof the nonlocal MSI DL model (4): how to generate theclusters for FBPs and how to set the group-block-sparsitythresholdrWk , r

    Hk , r

    Sk . For the first problem, we just employ

    the very efficientk-means++ [3] (with automatically andcarefully chosen initial seeds) to obtain clusters of all FBPs.The second problem is to be discussed in the next section.

    4.3. Decomposable Nonlocal MSI DL Model

    The nonlocal MSI DL problem can be further simplifiedby assuming that the dictionariesDW ,DH ,DS are redun-dant enough such that the dictionary atoms utilized in dif-ferent clusters have no overlap. That is, We assume thatthe spatial and spectral dictionaries can be reformulated asD

    W = [DW1 , · · · ,DWK ],D

    H = [DH1 , · · · ,DHK ] andD

    S =

    [DS1 , · · · ,DSK ], whereD

    Wk ∈ R

    dw×rWk , DHk ∈ R

    dh×rHk

    andDSk ∈ RdS×r

    Sk with

    ∑Kk=1 r

    Wk = mW ,

    ∑Kk=1 r

    Hk =

    mH and∑K

    k=1 rSk = mS , respectively, such that each M-

    SI clusterX (k) is only related to the sub-dictionaries:DWk ,D

    Hk andD

    Sk . The rationality of this assumption lies on the

    redundancy setting of the spatial and spectral dictionaries,and even when we suppose that two clusters share an atomof a dictionary, this assumption still holds by easily dupli-cating this atom in the dictionary. Under this assumption,each element in the sum of Eq. (4) can be equivalently re-formulated as:

    ∥X (k) − Z(k) ×1 D

    W ×2 DH ×3 D

    S∥

    =∥

    ∥X (k) − Sub(Z(k))×1 D

    Wk ×2 D

    Hk ×3 D

    Sk

    ∥, (5)

    whereSub(Z(k)) ∈ RrWk ×r

    Hk ×r

    Sk×nk denotes the intrinsic

    sub-tensor ofZ(k), and the original nonlocal MSI DL prob-

    lem can then be decomposed into a series of problems im-posed on all FBP clusters (k = 1, · · · ,K):

    minDW

    k,DH

    k,DS

    k,Y

    ∥∥∥X (k) − Y ×1 DWk ×2 DHk ×3 DSk∥∥∥ . (6)

    Note that after such transformation, the original problem (4)with constraints is now reformulated into a series of small-er problems without constraints. This makes the problemmuch easier to solve.

    Then the problems are how to solve Eq. (6) and howto set the group-block-sparsity parametersrWk , r

    Hk , r

    Sk . It

    should be noted that each MSI cluster tensorX (k) is of adimensionality redundancy in its3-rd spectral mode due toone of its important intrinsic properties: global correlationacross spectrum. This implies thatX (k) can be approximat-ed by a low-rank tensor obtained by:

    minU1,U2,U3,U4,G

    ‖X (k) − G ×1 U1 ×2 U2 ×3 U3 ×4 U4‖, (7)

    whereU1 ∈ RdWk ×r

    Wk , U2 ∈ Rd

    Hk ×r

    Hk , U3 ∈ Rd

    Sk×r

    Sk ,

    U4 ∈ RdNk ×r

    Nk correspond the basis vectors in the four

    modes ofX (k) with dWk ≥ rWk , d

    Hk ≥ r

    Hk , d

    Sk > r

    Sk and

    dNk ≥ rNk . HereG ∈ R

    rWk ×rHk ×r

    Sk×r

    Nk is the so-called core

    tensor [24] andrSk < dSk leads to the dimensionality reduc-

    tion in the spectral mode ofX (k). Eq. (7) can be readi-ly solved by the Tucker decomposition technique [24], andthe solution of Eq. (6) can then be easily obtained by lettingD

    Wk = U1, D

    Hk = U2, D

    Sk = U3 andY = G ×4 U4.

    As for the selection of the rank parameters (i.e., thegroup-block-sparsity thresholds)rWk , r

    Hk , r

    Sk and r

    Nk in

    Eq. (7), we can easily adopt the well known AIC/MDLmethod [27] on the mode-i (i = 1, 2, 3, 4) flatteningX(k)(i)of each cluster tensorX (k). Such a simple method is sub-stantiated to be effective throughout all our experiments.

    4.4. Decomposable Nonlocal MSI DL Algorithm

    Based on the aforementioned process, the decomposablenonlocal MSI DL algorithm can be summarized as Algo-

  • Figure 3. Simulated RGB images using Columbia MultispectralImage Database.

    rithm 1. We can then utilizeZ(k), DW , DH , DS outputtedfrom the proposed algorithm to recover all overlapping F-BPs and average the results to obtain the denoised MSI. Itshould be noted that all of the utilizedk-means++ [3] (step2), AIC/DIC [27] (step 3) and Tucker factorization [24](step4) techniques can be fastly implemented, which guaranteesthe efficiency of our algorithm in practice.

    Algorithm 1: Decomposable Nonlocal MSI DL

    Input : Input MSIH ∈ RdW×dH×dS

    Output : Spatial dictionariesDW = [DW1 , · · · ,DWK ],

    DH = [DH1 , · · · ,D

    HK ], spectral dictionary

    DS = [DS1 , · · · ,D

    SK ] and coefficient tensors

    Z(k), k = 1, · · · ,K1 Construct the entire FBP set ofH (Section 4.1).2 Group all FBPs into cluster tensorsX (k) ∈ Rdw×dh×dS×nk , k = 1, · · · ,K by k-means++(Section 4.2).

    3 Calculate the rank parametersrWk , rHk , r

    Sk andr

    Nk by

    applying the AIC/MDL method toX(k)(1) ,X(k)(2) , X

    (k)(3)

    andX(k)(4) , respectively (Section 4.3).

    4 Implement the Tensor factorization technique onX (k)

    by Eq. (7) to obtainU1, U2, U3, U4 andG, and letD

    Wk = U1, D

    Hk = U2, D

    Sk = U3 and

    Sub(Z(k)) = G ×4 U4. Reformulate the sub-tensorSub(Z(k)) to obtainZ(k).

    5. Experimental Results

    Columbia Datasets: We utilized the Columbia Multi-spectral Image Database [28]1 to test the proposed method.This dataset contains32 real-world scenes, each with spa-tial resolution512 × 512 and spectral resolution31 whichincludes full spectral resolution reflectance data collectedfrom 400nm to 700nm in 10nm steps. This MSI datasetis of a wide variety of real-world materials and objects, seeFig. 3. Each of these MSIs are scaled into the interval[0, 1]in our experiments.

    Noise models: In the experiments we used two types

    1http://www1.cs.columbia.edu/CAVE/databases/multispectral

    of noises commonly existed in real MSIs. One is the ad-ditive white Gaussian noise (AWGN), which comes frommany natural sources, such as the spontaneous thermal gen-eration of electrons. And the other is the Poisson noise(also known as shot noise) which is originating from themechanism of quantized photons and uniform exposure [5].We parameterized AWGN by its standard deviationσ andPoisson noise by the varianceH/2κ whereH is the noise-free signal. We designed two series of experiments. In thefirst group, we perturbed each of the32 Columbia MSI withGaussian noises of differentσ (up to0.3) and Poisson noisewith fixedκ = 5 . In the second case, we usedκ from 2 to6 and fixedσ = 0.1.

    To remove the dependency of the noise variance on theunderlying signal before the denoising and compensate theeffects of the bias in the produced filtered estimate, in allexperiments, the noisy MSI was firstly reformulated by avariance-stabilizing transformation (VST) [19] before im-plementing a denoising method, and after denoising, a cor-responding inverse transformation was used to obtain thefinal MSI reconstruction.

    Implementing details: Like most of the denoisingmethods based on non-local similarity such as BM3D andBM4D, we employ a preprocessing before the clusteringstep of our algorithm (Step 2). Our experiments show thata simple band-wise low-pass filtering is capable of greatlyimproving the accuracy of matching and facilitating the ef-fectiveness of the following steps of our proposed denoisingframework. It should be noted that the FBP widthdw andheightdh are the only two parameters needed to be set inour algorithm (all of the other parameters includingK, rWk ,rHk , r

    Sk andr

    Nk can be automatically selected). In all our

    experiments, we just simply set them asdw = dh = 8.Comparison methods: The comparison methods in-

    clude: band-wise K-SVD [1]2 and band-wise BM3D [9]3,state-of-the-art for the 2D extended band-wise approach;3D-cube K-SVD [1]2 , ANLM3D [18]4 and BM4D [16]3,state-of-the-art for the 2D extended 3D-cube-based ap-proach; LRTA [23] and PARAFAC [14], state-of-the-art forthe tensor-based approach5. All parameters involved in thecompeting algorithms were optimally assigned or automat-ically chosen as described in the reference papers.

    Evaluation measures: To comprehensively assess theperformance of all competing algorithms, we employfive quantitative picture quality indices (PQI) for per-formance evaluation, including peak signal-to-noise ra-tio (PSNR), structure similarity (SSIM [26]), feature sim-ilarity (FSIM [30]), erreur relative globale adimension-nelle de synth̀ese (ERGAS [25]) and spectral angle map-

    2http://www.cs.technion.ac.il/˜elad/software3http://www.cs.tut.fi/ foi/GCF-BM3D4http://personales.upv.es/jmanjon/denoising/arnlm.html5http://www.sandia.gov/ tgkolda/TensorToolbox/index-2.5.html

  • (a) Clean image (b) Noisy image (c) BwK-SVD [1] (d) BwBM3D [9] (e) 3DK-SVD [1]

    (f) ANLM3D [18] (g) BM4D [16] (h) LRTA [23] (i) PARAFAC [14] (j) Ours

    Figure 4. (a) The images at two bands (400nm and700nm) of chart and stuffed toy; (b) The corresponding images corrupted by the mixtureof σ = 0.2 Gaussian noise andκ = 5 Poisson noise; (c)-(j) The restored images obtained by the8 utilized MSI denoising methods. Twodemarcated areas in each image are amplified at a4 times larger scale for easy observation of details.

    per (SAM [29]). PSNR and SSIM are two conventionalPQIs in image processing and computer vision. They evalu-ate the similarity between the target image and the referenceimage based on MSE and structural consistency, respective-ly. FSIM emphasizes the perceptual consistency with thereference image. The larger these three measures are, thecloser the target MSI is to the reference one. ERGAS andSAM are usually appear corporately in the literature sincethey extract complementary information from an MSI. ER-GAS measures fidelity of the restored image based on theweighted sum of MSE in each band and SAM calculatesthe average angle between spectrum vectors of the targetMSI and the reference one across all spatial positions. D-ifferent from the former three measures, the smaller thesetwo measures are, the better does the target MSI estimatethe reference one. Note that SAM fully reflects the fidelityof the spectral reflectance of the target MSI.

    Performance evaluation: For each noise setting, allof the five PQI values for each competing MSI denoisingmethods on all32 scenes have been calculated and recorded.Table 1 lists the average performance (over different scenesand noise settings) of all methods in Poisson/Gaussian mix-

    ture noise case. More details are listed in our supplemen-tary material. It can be easily observed that the proposedmethod outperforms all other competing methods. As adetailed comparison, our method performs unsubstantiallyworse than BM4D with respect to only two PQI measures(PSNR and ERGAS) at a part of noise levels (sayσ ≤ 0.2with fixedκ andκ ≥ 5 with fixedσ. Please see supplemen-tary material). And in average, our method performs bestwith respect to all PQIs.

    To further depict the denoising performance of ourmethod, we depict in Fig. 4 two bands inchart and stuffedtoy that centered at400nm (the darker one) and700nm (thebrighter one), respectively. Two demarcated areas in thescene have been amplified in the figure at a4 times largerscale for easy observation of details. It is easy to observethat our method obtains a better recovery for both small-scale textures and large-scale structures, especially whenthe band energy is low (see the dark channel).

    Based on the SAM measures in Table 1, our methodis substantiated to be able to best recover the spectral re-flectance of the MSIs as compared to other competing meth-ods. To further clarify this point, we demonstrate in Fig. 5

  • σ = 0.1, 0.15, 0.2, 0.25, 0.3, κ = 5PSNR SSIM FSIM ERGAS SAM

    Noisy image 14.52± 0.04 0.060± 0.032 0.469± 0.111 1156.9± 327.0 1.133± 0.194BwK-SVD [1] 25.77± 1.12 0.370± 0.027 0.792± 0.038 296.5± 58.4 0.614± 0.163BwBM3D [9] 33.45± 2.58 0.828± 0.048 0.910± 0.015 124.4± 34.0 0.340± 0.1283DK-SVD [1] 28.07± 1.29 0.497± 0.024 0.859± 0.025 232.5± 42.5 0.581± 0.170ANLM3D [18] 33.61± 2.24 0.836± 0.036 0.916± 0.020 121.1± 27.2 0.365± 0.142BM4D [16] 36.25± 2.16 0.885± 0.027 0.938± 0.013 90.0 ± 19.0 0.314± 0.131LRTA [23] 33.90± 2.74 0.850± 0.064 0.925± 0.017 118.4± 35.8 0.234± 0.082PARAFAC [14] 27.66± 2.93 0.747± 0.100 0.862± 0.053 243.3± 70.3 0.388± 0.117Ours 36.25± 2.56 0.914± 0.031 0.952± 0.008 88.9± 21.6 0.182± 0.070

    σ = 0.1, κ = 2, 3, 4, 5, 6PSNR SSIM FSIM ERGAS SAM

    Noisy image 17.45± 1.02 0.088± 0.035 0.573± 0.094 756.3 ± 129.4 1.055± 0.194BwK-SVD 27.81± 1.85 0.528± 0.051 0.849± 0.031 224.4± 32.9 0.514± 0.145BwBM3D 34.14± 2.80 0.864± 0.046 0.926± 0.014 113.5± 32.4 0.267± 0.0983DK-SVD 29.95± 1.94 0.662± 0.048 0.901± 0.021 180.3± 28.7 0.486± 0.151ANLM3D 31.53± 1.53 0.686± 0.030 0.874± 0.035 155.8± 23.5 0.439± 0.145BM4D 36.93± 2.36 0.910± 0.025 0.951± 0.011 81.7 ± 17.1 0.259± 0.108LRTA 33.31± 3.20 0.812± 0.075 0.916± 0.022 140.7± 49.6 0.271± 0.100PARAFAC 30.27± 3.13 0.802± 0.084 0.896± 0.038 177.9± 60.9 0.323± 0.113Ours 36.94± 2.75 0.930± 0.029 0.963± 0.007 81.5± 20.5 0.150± 0.052

    Table 1. Average performance comparison of8 competing methods with respect to5 PQIs mixture noise experiments. For both settings,the results are obtained by averaging through the 32 scenes and the varied parameters.

    b c d

    e f g

    (a) (e) (f) (g)

    response

    (b) (c) (d)

    BwK−SVD

    BwBM3D

    3DK−SVD

    ANLM3D

    BM4D

    LRTA

    PARAFAC

    Ours

    400 500 600 700

    −0.1

    0

    0.1

    wave length / nm

    diff. reflectance

    400 500 600 700

    −0.1

    0

    0.1

    wave length / nm

    diff. reflectance

    400 500 600 700

    −0.1

    0

    0.1

    wave length / nm

    diff. reflectance

    400 500 600 700

    −0.1

    0

    0.1

    wave length / nm

    diff. reflectance

    400 500 600 700

    −0.1

    0

    0.1

    wave length / nm

    diff. reflectance

    400 500 600 700

    −0.1

    0

    0.1

    wave length / nm

    diff. reflectance

    Figure 5. (a) Simulated RGB image ofsponges. (b)-(g) Spectral reflectance difference curves of8 competing methods at6 locations ofsponges. The noise-free image is corrupted by the mixture ofσ = 0.2 Gaussian noise andκ = 5 Poisson noise.

    the spectral reflectance difference curves of all competingmethods at6 locations insponges. A spectral reflectance d-ifference curve of a MSI denoising method at a spatial loca-tion is attained by sequentially interpolating the31 elementsof the deviation between the restored and the clean MSI a-long their spectral mode. It is easy to see that our methodobtains the best approximation of the intrinsic spectral pat-terns of the original MSI, which fully complies with ourquantitative evaluation.

    MSI Denoising performance on natural scenes:Wealso used some MSIs from real-world scenes [11]6 to testour denoising method. This dataset comprises15 ruralscenes (containing rocks, trees, leaves, grass, earth,etc.)and 15 urban scenes (containing walls, roofs, windows,plants, indoor,etc.). All of them are illuminated by the di-rect sunlight between mid-morning to mid-afternoon. Asthe images are taken from a fairly far distance and the en-

    6http://personalpages.manchester.ac.uk/staff/david.foster/Hyperspectralimagesof naturalscenes02

    ergy is spread over all bands, these MSIs contain certaindegree of noises. We employed the similar implementationstrategies and parameter settings for our method and com-pared the similar competing methods as the first series ofexperiments.

    Our experimental results show that our method can gen-erally ameliorate the image quality contained in these MSIs.For easy observation we illustrates an example image locat-ed at a band of a rural MSI in Fig. 6. An area of interest isamplified in the restored image obtained by all competingmethods. It can be easily observed that the image restoredby our method properly removes the noise while finely pre-serves the structure underlying the image, while the resultsobtained by most of other competing methods contain evi-dent significant blurry area as compared to the original im-age. Among these methods, ANLM3D and LRTA performcomparatively better in structure preserving. However, theimages recovered by them remain more unexpected sharpnoises than that obtained by our method.

  • (a) Natural image (b) BwK-SVD [1] (c) BwBM3D [9] (d) 3DK-SVD [1]

    (e) ANLM3D [18] (f) BM4D [16] (g) LRTA [23] (h) PARAFAC [14] (i) Ours

    Figure 6. (a) The image located at the12th band in scene2 of the natural scene dataset. (b)-(i) The restored images obtained by the 8utilized MSI denoising methods. The demarcated area in eachimage is amplified for easy observation of details. The figureis better seenby zooming on a computer screen.

    Acknowledgement

    This research was supported by 973 Program of Chinawith No. 3202013CB329404 and the NSFC projects withNo. 61373114, 11131006, 6107505.

    References[1] M. Aharon, M. Elad, and A. Bruckstein. K-svd: An algorithm for designing

    overcomplete dictionaries for sparse representation.IEEE Trans. Signal Pro-cessing, 54(11):4311–4323, 2006.

    [2] B. Aiazzi, L. Alparone, A. Barducci, S. Baronti, and I. Pippi. Information the-oretic assessment of sampled hyperspectral imagers.IEEE Trans. Geoscienceand Remote Sensing, 39(7):1447–1458, 2001.

    [3] D. Arthur and S. Vassilvitskii. K-means++: the advantages of careful seeding.Society for Industrial and Applied Mathematics.

    [4] A. Buades, B. Coll, and J. M. Morel. A non-local algorithmfor image denois-ing. In ICVPR, 2005.

    [5] P. D. Burns.Analysis of Image Noise in Multispectral Color Aquisition. PhDthesis, Center for Imaging Science, Rochester Institute ofTechnology, 1997.

    [6] C. F. Caiafa and A. Cichocki. Computing sparse representations of multidi-mensional signals using kronecker bases.Neural Computation, 25(1):186–220,2013.

    [7] J. D. Carroll and J. J. Chang. Analysis of individual differences in multidi-mensional scaling via an n-way generalization of eckart-young decomposition.Psychometrika, 35(3):283–319, 1970.

    [8] A. Chen. The inpainting of hyperspectral images: a survey and adaptation tohyperspectral data. InSPIE, 2012.

    [9] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse3d transform-domain collaborative filtering.IEEE Trans. Image Processing,16(8):2080–2094, 2007.

    [10] L. De Lathauwer, B. De Moor, and J. Vandewalle. On the best rank-1 andrank-(r1, r2, · · · , rn) approximation of higher-order tensors.SIAM Journalon Matrix Analysis and Applications, 21(4):1324–1342, 2000.

    [11] D. H. Foster, K. Amano, S. M. C. Nascimento, and M. J. Foster. Frequencyof metamerism in natural scenes.Journal of the Optical Society of America A,23(10):2359–2372, 2006.

    [12] R. Kawakami, J. Wright, Y. W. Tai, Y. Matsushita, and M. K. Ikeuchi. High-resolution hyperspectral imaging via matrix factorization. In CVPR, 2011.

    [13] J. Kerekes and J. Baum. Full-spectrum spectral imagingsystem analytical mod-el. IEEE Trans. Geoscience and Remote Sensing, 5(2):571–580, 2005.

    [14] X. F. Liu, S. Bourennane, and C. Fossati. Denoising of hyperspectral imagesusing the parafac model and statistical performance analysis. IEEE Trans. Geo-science and Remote Sensing, 50(10):3717–3724, 2012.

    [15] M. Maggioni and A. Foi. Nonlocal transform-domain denoising of volumetricdata with groupwise adaptive variance estimation. InSPIE, 2012.

    [16] M. Maggioni, V. Katkovnik, K. Egiazarian, and A. Foi. A nonlocal transform-domain filter for volumetric data denoising and reconstruction. IEEE Trans.Image Process., 22(1):119–133, 2013.

    [17] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Non-local sparsemodels for image restoration. InICCV, 2009.

    [18] J. V. Manj́on, P. Couṕe, L. Mart́i-Bonmat́i, L. Collins, and M. Robles. Adap-tive non-local means denoising of mr images with spatially varying noise levels.Journal of Magnetic Resonance Imaging, 31(1):192–203, 2010.

    [19] M. Mäkitalo and A. Foi. Optimal inversion of the generalized anscombe trans-formation for poisson-gaussian noise.IEEE Trans. Image Process., 22(1):91–103, 2013.

    [20] E. Michael and A. Michal. Image denoising via sparse andredundant represen-tations over learned dictionaries.IEEE Trans. Image Processing, 15(12):3736–3745, 2006.

    [21] H. V. Nguyen, A. Banerjee, and R. Chellappa. Tracking via object reflectanceusing a hyperspectral video camera. InCVPR Workshop, 2010.

    [22] Y. T. Qian, Y. H. Shen, M. C. Ye, and Q. Wang. 3-d nonlocal means filter withnoise estimation for hyperspectral imagery denoising. InIGARSS, 2012.

    [23] N. Renard, S. Bourennane, and J. Blanc-Talon. Denoising and dimensionalityreduction using multilinear tools for hyperspectral images. IEEE Trans. Geo-science and Remote Sensing, 5(2):138–142, 2008.

    [24] L. R. Tucker. Some mathematical notes on three-mode factor analysis.Psy-chometrika, 31(3):279–311, 1996.

    [25] L. Wald. Data fusion: definitions and architectures: fusion of images of differ-ent spatial resolutions. Presses de l’Ecole des Mines, 2002.

    [26] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image qualityassessment: From error visibility to structural similarity. IEEE Transactions onImage Processing, 13(4):600–612, 2004.

    [27] M. Wax and T. Kailath. Detection of signals by information theoretic criteria.IEEE Transaction on Acoustics, Speech and Signal Processing, 33(2):387–392,1985.

    [28] F. Yasuma, T. Mitsunaga, and D. Iso. Generalized assorted pixel camera: post-capture control of resolution, dynamic range and spectrum.IEEE Transactionon Image Processing, 19(9):2241–2253, 2010.

    [29] R. H. Yuhas, J. W. Boardman, and A. F. H. Goetz. Determination of semi-aridlandscape endmembers and seasonal trends using convex geometry spectral un-mixing techniques. InSummaries of the4th Annual JPL Airborne GeoscienceWorkshop, 1993.

    [30] L. Zhang, L. Zhang, X. Q. Mou, and D. Zhang. Fsim: a feature similarityindex for image quality assessment.IEEE Transactions on Image Processing,20(8):2378–2386, 2011.

    [31] S. Zubair, W. Dai, and W. Wang. Sparseness constrained tensor factorizationalgorithm for dictionary learning over high-dimensional space. InIMA, 2012.