12
1 Multi-Scale Dictionary Learning via Cross-Scale Cooperative Learning and Atom Clustering for Visual Signal Processing Jie Chen, Lap-Pui Chau Abstract—For sparse signal representation, the sparsity across the scales is a promising yet under investigated direction. In this work, we aim at designing a multi-scale sparse representation scheme to explore such potential. A multi-scale dictionary (MD) structure is designed. A Cross-scale Matching Pursuit (CMP) al- gorithm is proposed for multi-scale sparse coding. Two dictionary learning methods: Cross-scale Cooperative Learning MD/CCL), and Cross-scale Atom Clustering (MD/CAC) are proposed with each focusing on one of the two important attributes of an efficient multi-scale dictionary: the similarity, and uniqueness of corresponding atoms in different scales. We analyze and compare their different advantages in the application of image denoising under different noise levels, where both methods produce state- of-the-art denoising results. Index Terms—multi-scale sparse representation; cross-scale learning; dictionary atom clustering. I. I NTRODUCTION Images of natural scenes tend to produce repeated patterns. These primary patterns may appear at different locations, with various translation, rotation, and scaling. Our daily visual perception experience tells us: pattern similarity of a natural scene resides not only within a single scale, but across multiple ones. An obvious example is shown in Fig. 1(a), where seven matryoshka dolls are juxtaposed in two rows. Most of the details on the dolls are identical and vary only in scale. Other examples, such as the spiral steps in 1(b), and the intricate layered carvings on the Gothic style arch in Fig. 1(c), gradually diminish in a perspective view. Less obvious but still discernible is the vein texture on the wings of a monarch butterfly in Fig. 1(d). All these images are good examples of primary patterns repeating themselves across scales in natural imagery. It is a well observed fact that visual signals reside in a much lower dimensional ambience in signal space. The distribution tends to cluster around multiple local centroids. This obser- vation gives rise to the thrive of research on example-based dictionary learning for sparse image representation, which has received enormous success in computer vision and pattern recognition applications in recent years. The key characteristic of an efficient dictionary is its atoms’ ability to generalize the distribution centroids of possible signals. Essentially, if all the Copyright (c) 2014 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to [email protected]. The authors are with the School of Electrical & Electronic Engi- neering, Nanyang Technological University, Singapore, 639798. (e-mail: [email protected], [email protected]). Fig. 1: Four natural images with “multi-scale” patterns. signal centroids with high frequencies of occurrence are known and used as dictionary atoms, the dictionary can be considered as a good one. Dictionary learning is most frequently expressed as the following optimization problem [1]: min α ||α|| 0 subject to ||x - || 2 , (1) where x R N×1 is the input signal, which is an image patch reshaped into a vector. D R N×K is the dictionary with K atoms. α is the coding coefficient vector for x, the non-zero element of which is limited to a small number to ensure that the representation is sparse. Numerous work, such as convex relaxation [2], greedy iterative update methods etc [3], has been done to solve the problem in Eqn.(1). With the signal x and the dictionary D stretching across multiple scales, the aim of this paper is to investigate the potential of cross-scale atom similarity based on the framework of Eqn. (1). II. RELATED WORK The idea of “cross-scale sparsity” is not new. In Mallat’s multi-resolution analysis (MRA) [4] framework, a scaling and wavelet function pair {φ, ψ} is chosen, and then the scaled and shifted versions of ψ are integrated with the signal to transform the signal into domains of different resolutions. The parsimony

Multi-Scale Dictionary Learning via Cross-Scale ... Sparse... · Multi-Scale Dictionary Learning via Cross-Scale Cooperative Learning and Atom Clustering for Visual Signal Processing

  • Upload
    others

  • View
    28

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Multi-Scale Dictionary Learning via Cross-Scale ... Sparse... · Multi-Scale Dictionary Learning via Cross-Scale Cooperative Learning and Atom Clustering for Visual Signal Processing

1

Multi-Scale Dictionary Learning via Cross-ScaleCooperative Learning and Atom Clustering for

Visual Signal ProcessingJie Chen, Lap-Pui Chau

Abstract—For sparse signal representation, the sparsity acrossthe scales is a promising yet under investigated direction. In thiswork, we aim at designing a multi-scale sparse representationscheme to explore such potential. A multi-scale dictionary (MD)structure is designed. A Cross-scale Matching Pursuit (CMP) al-gorithm is proposed for multi-scale sparse coding. Two dictionarylearning methods: Cross-scale Cooperative Learning MD/CCL),and Cross-scale Atom Clustering (MD/CAC) are proposed witheach focusing on one of the two important attributes of anefficient multi-scale dictionary: the similarity, and uniqueness ofcorresponding atoms in different scales. We analyze and comparetheir different advantages in the application of image denoisingunder different noise levels, where both methods produce state-of-the-art denoising results.

Index Terms—multi-scale sparse representation; cross-scalelearning; dictionary atom clustering.

I. INTRODUCTION

Images of natural scenes tend to produce repeated patterns.These primary patterns may appear at different locations, withvarious translation, rotation, and scaling. Our daily visualperception experience tells us: pattern similarity of a naturalscene resides not only within a single scale, but across multipleones. An obvious example is shown in Fig. 1(a), where sevenmatryoshka dolls are juxtaposed in two rows. Most of thedetails on the dolls are identical and vary only in scale.Other examples, such as the spiral steps in 1(b), and theintricate layered carvings on the Gothic style arch in Fig. 1(c),gradually diminish in a perspective view. Less obvious butstill discernible is the vein texture on the wings of a monarchbutterfly in Fig. 1(d). All these images are good examples ofprimary patterns repeating themselves across scales in naturalimagery.

It is a well observed fact that visual signals reside in a muchlower dimensional ambience in signal space. The distributiontends to cluster around multiple local centroids. This obser-vation gives rise to the thrive of research on example-baseddictionary learning for sparse image representation, which hasreceived enormous success in computer vision and patternrecognition applications in recent years. The key characteristicof an efficient dictionary is its atoms’ ability to generalize thedistribution centroids of possible signals. Essentially, if all the

Copyright (c) 2014 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending an email to [email protected].

The authors are with the School of Electrical & Electronic Engi-neering, Nanyang Technological University, Singapore, 639798. (e-mail:[email protected], [email protected]).

Fig. 1: Four natural images with “multi-scale” patterns.

signal centroids with high frequencies of occurrence are knownand used as dictionary atoms, the dictionary can be consideredas a good one.

Dictionary learning is most frequently expressed as thefollowing optimization problem [1]:

minα||α||0 subject to ||x−Dα||2 ≤ ε, (1)

where x ∈ RN×1 is the input signal, which is an image patchreshaped into a vector. D ∈ RN×K is the dictionary with Katoms. α is the coding coefficient vector for x, the non-zeroelement of which is limited to a small number to ensure thatthe representation is sparse. Numerous work, such as convexrelaxation [2], greedy iterative update methods etc [3], hasbeen done to solve the problem in Eqn.(1). With the signalx and the dictionary D stretching across multiple scales, theaim of this paper is to investigate the potential of cross-scaleatom similarity based on the framework of Eqn. (1).

II. RELATED WORK

The idea of “cross-scale sparsity” is not new. In Mallat’smulti-resolution analysis (MRA) [4] framework, a scaling andwavelet function pair φ, ψ is chosen, and then the scaled andshifted versions of ψ are integrated with the signal to transformthe signal into domains of different resolutions. The parsimony

Page 2: Multi-Scale Dictionary Learning via Cross-Scale ... Sparse... · Multi-Scale Dictionary Learning via Cross-Scale Cooperative Learning and Atom Clustering for Visual Signal Processing

2

of the transformation coefficients relies on the design ofφ, ψ, the shape of which should be a global optimal patternin the subspace of the signal across the scales. The waveletdecomposition is a celebrated achievement in this direction[5]. Some scale-adaptive image denoising algorithms based onthe wavelet structure produce very good performances such asreported in [6], [7], [8], [9].

With the recent success of example-based dictionary learn-ing for sparse signal representation [10], and its overwhelmingcomparative advantage over the pre-constructed wavelet meth-ods in various applications, such as visual signal compression[11] [12], denoising [13], interpolation [14] [15] and visualrecognition [16]–[19] etc, the idea of adapting the concept of“scaling” from the wavelet to the dictionary learning basedmethods looks promising.

One line of work focuses on learning dictionaries for thewavelet coefficients directly. Ophir et al. [20] proposed to trainsub-dictionaries at different wavelet bands, which managedto squeeze out some of the redundancy left by the waveletdecomposition, specifically the spatial correlation betweenwavelet coefficients in the same band or between bands, andis able to produce a much sparser representation than waveletand single scale K-SVD. Yan et al. [21] further proposed toutilize clustering on data samples for the training of many sub-dictionaries at each wavelet decomposition level to improvecoding efficiency and reduce artifacts. Another similar workwas done by Skretting et al. [22], where dictionaries are trainedin both the image and wavelet domains for the application ofimage compression.

Another line of work took advantage of the image pyramidmodel. In the recent work of Hughes et al. [23], after decom-posing an input image into a pyramid of distinct frequencybands, sparse coding and dictionary learning are performedon individual levels of the resulting pyramid. In their model,a set of dictionary atoms are shared and learned across allscales. A statistical model that allows for efficient inferenceof parameters is proposed. Another work done by Liu et al.in [24] investigated the idea of dictionary refinement fromlower to higher scales, which led to a more globally optimaldictionary. Although the final output dictionary is still singlescale, we can see the process of pattern generalization acrossdifferent scales.

All these methods, either using wavelet transform or pyra-mid decomposition, share one common trait: the “scaling” isbrought about by a separate transform/decomposition operator,but the dictionaries themselves still remain “single scale”; theefficiency of the combination of a separate transform operatorwith sparse coding is questionable.

Another line of work focuses on designing a dictionary itselfthat is multi-scale. Mairal et al. [25] [26] used a multi-scalequadtree model, where a big image patch is decomposed alongthe tree to sub-patches of smaller scales, and a dictionary islearnt at each scale. The quadtree structure constrains smallerpatches from shifting inside larger ones. Different dictionaryscales cooperate and combine to depict a signal of interest,which produces a much sparser representation than its singlescale counterparts. This model produces convincing results inimage denoising. However, being independent from a cross-

scale operator (e.g., wavelet transform operator) implies thatthe proposed model, although being multi-scale, overlooks thecross-scale dependencies of the signal, which makes it sub-optimal in this sense.

In [27] Aharon et al. gave an interesting example of howdictionary atoms in their different forms (a 2D dictionaryin this case) could be manipulated into the existing schemeof sparse coding and dictionary learning. Although it is ona different topic, this work inspired the authors in how tomanipulate signals from different scales into one processingframework.

III. OUR CONTRIBUTIONS

We aim at designing a more direct and efficient multi-scalesparse representation framework, separate from the waveletdecomposition, which was involved in most previous availablemulti-scale schemes. The designed multi-scale dictionary ismulti-resolution in itself, with dictionary atoms learnt fordifferent signal scales. Cross-scale sparsity and similarity iscarefully investigated.

Based on this idea, a Cross-scale Matching Pursuit (CMP)algorithm is proposed for multi-scale sparse coding. Two dic-tionary learning methods: Cross-scale Cooperative Learning(MD/CCL), and Cross-scale Atom Clustering (MD/CAC) areproposed with each focusing on one of the two importantattributes of an efficient multi-scale dictionary, i.e., the patternsimilarity and uniqueness of corresponding atoms in differentscales. We analyze and compare their different advantages inthe application of image denoising under different noise levels,where both methods produce state-of-the-art denoising results.

The rest of this paper is organized as follows. In SectionIV the structure of the proposed multi-scale dictionary isexplained. The sparse coding algorithm based on such dic-tionary structure is also introduced in this section. In SectionV two different dictionary learning strategies: MD/CCL andMD/CAC are explained in details. The parameter analysis ofthe proposed multi-scale scheme, as well as their image de-noising performance data are presented in Section VI. SectionVII concludes the work.

IV. MULTISCALE SPARSE REPRESENTATION MODEL

In this section, we explain the structure of the proposedmulti-scale dictionary, and then we proceed to introduce theCross-scale Matching Pursuit (CMP) sparse coding algorithm.

First, an image I is decomposed into overlapped patches indifferent scales, i.e., patches of different sizes

√Nl ×

√Nl.

Then, patches are reshaped into vectors xli ∈ RNl×1, wherel = 1, ...L is the scale index, and L is the total number ofscales used. Patches are placed on corresponding points ongrids in different scales, and they are indexed from the top-leftmost pixel i. We do not use the central pixel to index a patchsince we do not assume that cross-scale similarity resides atthe same indexed location or anywhere in the vicinity.

Let Dl ∈ RNl×K be the dictionary in scale l (K is thenumber of dictionary atoms in each scale). The multi-scaledictionary is the collection of Dl| l = 1, ..., L. For everylarger scale in Dl, the dictionary atom dimension increases

Page 3: Multi-Scale Dictionary Learning via Cross-Scale ... Sparse... · Multi-Scale Dictionary Learning via Cross-Scale Cooperative Learning and Atom Clustering for Visual Signal Processing

3

by a fixed ratio as compared to the previous scale, and theimage patch data size increases accordingly.

A. The Cross-Scale Matching Pursuit

Assume that we have a multi-scale dictionary Dl| l =1, ...L, the Cross-scale Matching Pursuit (CMP) algorithmaims to find the sparse coding coefficients for a signal ofinterest.

Let αli be the sparse coding coefficient vector for the signalxli over the dictionary Dl. Orthogonal matching pursuit (OMP)[28] is applied to calculate sparse vectors within each singlescale. Let Nl be the dimensionality of xli. The average pixelrepresentation error εli can be calculated as:

εli = ||xli −Dlαli||22/Nl, (2)

For the group of signal vectors with the same index i butfrom different scales xli| l = 1, ..., L, one scale l0 willbe chosen if the dictionary atoms in Dl0 better represent thesignal. The criterion for the selection is defined as:

l0 = argminlεli f(||αli||0), (3)

where ||αli||0 calculates the number of non-zero coefficientsin αli, and f(t) is a kernel function positively proportional toparameter t. In our current implementation, f(t) =

√t.

The criterion in Eqn. (3) takes two factors into account dur-ing scale selection. The first factor is the representation error:for a fair comparison across scales, the vector error is dividedby the its dimensionality, which actually measures the averagepixel error. The second factor is the representation sparsity:measured in the 0 norm (number of non-zero coefficients) ofthe coefficient. The two factors ensure the fidelity and sparsityof a representation, which are the key concerns of the CMP.The steps of the CMP algorithm are generalized in Algorithm1.

V. MULTI-SCALE DICTIONARY LEARNING

An efficient dictionary is able to maximize the sparsity level,and minimize the reconstruction error in a representation,which is expressed in Eqn. (1). In an effort to further promotesparsity in a multi-scale set-up, we propose two dictionarylearning methods: Multi-scale Dictionary Cross-scale Cooper-ative Learning (MD/CCL), and Multi-scale Dictionary Cross-scale Atom Clustering (MD/CAC). Each of these methodsfocuses on one of the two important attributes of an efficientmulti-scale dictionary, i.e., the pattern similarity and unique-ness of corresponding atoms in different scales. Both methodsbear distinctive advantages under different circumstances, thedetails of which will be explained in this section.

A. Mult-scale Dictionary Cross-scale Cooperative Learning(MD/CCL)

For the first attempt, we try to implement a dictionarystructure in which corresponding atoms in all larger scalesequal to one atom from the base scale (l = 1), up to a linearinterpolation:

Dl = dlk| dlk = Rl1d1k, k = 1, ...K. (4)

Algorithm 1: The cross-scale matching pursuitinput parameter: input image I; multi-scale dictionaryDl| l = 1, ..., L; target coding error ε0.-Decompose I into patches of different scales, and forminto vectors: xli| i = 1, ...; l = 1, ..., Lfor each image patch i = 1, ... do

for each dictionary scale l = 1, ..., L do-Set initial support for xli as Sli = ∅; set initialresidual: εli = ||xli||22/Nlwhile εli > ε0 do

-Sweep current residual εli through all atomsin Dl, find the atom dlk0 that produces thelargest projection: k0 = argmax

kdlTk εli

-Update support Sli = Sli⋃dlk0 .

-Calculate coefficient s.t. Suppαli = Sli:αli = argmin

α||Dlα− rli||22

-Update current residual according to:εli = ||xli −Dlα

li||22/Nl

endend-Select scale l0 = argmin

l

εli f(||αli||0)

, set

coefficient(s) value αli = 0, for all l 6= l0.end

Fig. 2: Demonstration of the scale transform matrix “Rlp”.Bilinear interpolation is used for the transformation.

Where dlk is the kth dictionary atom in Dl ∈ RNl×K .The matrix Rlp (p = 1) is a linear operator matrix thatbilinearly interpolates a dictionary atom from dpk ∈ RNp×1

to dlk ∈ RNl×1. Rlp relates the pixel to be interpolated withthe location of the original pixels that will contribute tothe interpolation, and it stores the bilinear coefficients. Thefunctional diagram of Rlp has been illustrated in Fig. 2. Insuch a structure, dictionary atoms with the same index k atdifferent scales are related by Rlp, which enforces cross-scalesimilarity for every atom across the scales.

Suppose we already have the sparse coding coefficients αliusing CMP on Dl (l = 1, ...L). We propose to update the

Page 4: Multi-Scale Dictionary Learning via Cross-Scale ... Sparse... · Multi-Scale Dictionary Learning via Cross-Scale Cooperative Learning and Atom Clustering for Visual Signal Processing

4

dictionary Dl to minimize the following target error function:

d1k = arg min

d1k

L∑l=1

||xli −K∑k=1

Rl1d1kα

li(k)||22/Nl. (5)

Obviously, the only variable in Eqn. (5) is d1k, which is from

the base scale (l = 1); all the larger scale atoms will becalculated according to Eqn. (4) once d1

k is decided.The Stochastic Gradient (SG) descent method [29] is used

to update the multi-scale dictionary. The SG method have beenused quite extensively in adaptive filters (least-mean-squares),computerized tomography, and neural network learning algo-rithms, etc. According to Eqn. (5), the partial derivative of thetarget function over d1

k is:

∂d1k

L∑l=1

||xli −K∑k=1

Rl1d1kα

li(k)||22/Nl (6)

=2×L∑l=1

(

K∑k=1

Rl1d1kα

li(k)− xli)(

K∑k=1

αli(k)Rl1)/Nl,

the dictionary update equation can therefore be written as:

(d1k)new = (d1

k)old − µ ·L∑l=1

(

K∑k=1

Rl1d1kα

li(k)− xli)× (7)

(

K∑k=1

αli(k)Rl1)/Nl,

where µ is the step size parameter, which decides the dictio-nary update rate after each incoming image patch xli. The SGalgorithm converges well given the number of image patches(and of different scales) is very large. The name “cooperativelearning” is given to this method based on the fact that signalsacross all scales contribute their part during the dictionaryupdate of the base scale atoms. Such a cooperative learningmanner is very beneficial for its application in signal/imagedenoising, since noise is usually scale-independent. Thus,jointly using image samples from different scales will averageout the noise more effectively.

B. Multi-scale Dictionary Cross-scale Atom Clustering(MD/CAC)

Unlike the “brute force” that we used in MD/CCL, whereall atoms across all scales are forced to be bilinearly similar,we propose a more adaptive learning method that aims atenforcing cross-scale similarities and preserving scale-uniquepatterns at the same time.

With the coding coefficient vectors αli calculated fromCMP, we introduce a new dictionary structure in which a cer-tain atom dlk has a matched similar atom dlckc in another scale.The matching is done according to the following criterion:

kc, lc = arg minj, p

∥∥dlk −Rlpdpj∥∥2

2. (8)

Eqn. (8) basically looks for the most similar atom in therescaled dictionary RlpD

p. Rlp is the same scale transformmatrix as in Eqn. (4), except that for MD/CCL, Rlp is alwaysan interpolator (p = 1, p < l); however for MD/CAC, Rlp

can be either an interpolator or a decimator depending on therelation between p and l.

The dictionary learning can be thought of as a cross-scaleclustering process. Matched atoms will be drawn similar toeach other after it is updated according to Eqn. (9):

dlk = arg mindlk

∑i∈Sl

i

||[xli −∑m 6=k

dlmαli(m)]− dlkαli(k)||22 (9)

+λ||dlk −Rlpdpkc||22,

where the set Slk is formed by finding patches in scale l thatused atom dlk during CMP coding. According to the previoussection, only the patches that have been selected by CMP tobe the best representative scale for the image patch at positioni will have a value for αli; otherwise, αli = 0. Therefore,Slk = i|∀i, αli(k) 6= 0. The first term of Eqn. (9) calculatesthe representation error when atom dlk is not used. The secondterm is the cross-scale atom clustering term. Similar dictionaryatoms across scales are drawn closer after each update becauseof this term. λ is a tuning parameter.

Our algorithm is inspired by K-SVD [13], and comprisestwo steps at each iteration: CMP coding and dictionary update.The CMP algorithm outputs the coding coefficients αli basedon the current multi-scale dictionary Dl|l = 1, ...L, andthen the cross-scale atom clustering algorithm will updatethe dictionary based on the current αli. Each atom in eachdictionary scale is updated separately. For the dictionary atomdlk of scale l with index k, the algorithm finds the set Slk thatconsists of image patches that have used the dictionary atomdlk during the last CMP. For each xli ∈ Slk, the representationerror is calculated as:

εli = xli −∑m 6=k

dlmαli(m), xli ∈ Slk, (10)

where the contribution from the atom dlk is excluded. All εliscalculated from xli ∈ Slk are then juxtaposed as columns toform the matrix Elk ∈ RNl×Σl

k , where Σlk equals the numberof elements in set Slk.

The algorithm then tries to find the nearest atom dpkc inscale p (p 6= l) based on Eqn. (8), and uses it as the currentclustering centroid:

δlk = Rlpdpkc. (11)

δlk is then appended to the last column of Elk to form thematrix M l

k ∈ RNl×(Σlk+1):

M lk = [Elk, λ ∗ δlk]. (12)

Here, λ is a tuning parameter. The algorithm then appliesSVD on M l

i = U∆V T . The column in U that correspondsto the largest singular value in ∆ will be used to updatethe dictionary atom dlk; the atom’s coding coefficient vectorαli(k)i∈Slk will be updated with the first Σlk elements of thecolumn in V that also corresponds to the largest singular valuein ∆.

After going through all the dictionary atoms in scale l, thealgorithm goes on to iterate through atoms in all other scales,with difference only in the scaling operator Rlp, where it couldchange from being an bilinear interpolator to a decimator.

Page 5: Multi-Scale Dictionary Learning via Cross-Scale ... Sparse... · Multi-Scale Dictionary Learning via Cross-Scale Cooperative Learning and Atom Clustering for Visual Signal Processing

5

The final updated multi-scale dictionary will be used asthe new input to the CMP in the next iteration, until thetermination condition is reached. Algorithm 2 gives a detaileddescription of the cross-scale atom clustering algorithm.

Algorithm 2: Dictionary learning with cross-scale atomclustering.

input parameter: input image Patches Xi; multiscaledictionary Dl|l = 1, ..., L.repeat

cross-scale sparse coding:Apply the CMP algorithm and calculate thecoefficients αli for each patch based on the currentmulti-scale dictionary Dl|l = 1, ...L.

dictionary update:for each dictionary scale l = 1, ..., L do

for each dictionary atom k = 1, ...,K do-Find the set of patches that use this atom:Slk = i|αli(k) 6= 0-For each patch xli ∈ Sk,l, Compute therepresentation error for each patch:εli = xli −

∑m 6=k

dlmαli(m)

-Find the nearest interpolated/decimatedatom in other scale(s), and calculate thedisplacement: δlk = min

kc

∥∥dlk −Rlpdpkc∥∥2

2

-Set Elk as matrix with columns ofεlii∈Slk . Append δlk with tuning coefficientλ to the last column of Elk to form matrixM lk = [Elk, λ ∗ δlk], M l

k ∈ RNl×(Σlk+1).

-Apply SVD on M lk = U∆V T , use the

column in U and first Nl elements of thecolumn in V that correspond to the largestsingular value in ∆ to update dlk andαli(k)i∈Slk , respectively.

endend

until iterate J times;

In Algorithm 2, we enforce all the atoms at one scale tofind their matched atoms at another scale. This naturally raisesthe concern: what if some patches at one scale do not havecross-scale similarities. Refer to Fig. 3 which illustrates thestructure of the error matrix M l

k ∈ RNl×(Σlk+1). The first Σ

columns (bright) are data representation errors from a certaindictionary atom dlk. Each column represents an image patchthat has selected dlk during CMP coding; the last (dark) columnis the rescaled atom (via scaling operator Rlp) from anotherdictionary scale (dpkc ) that has been matched with dlk. For acertain atom pattern that is unique and popular for a singlescale, which means this atom is used frequently for that scale,the error matrix M l

k will have more “bright” columns; so whenM lk is later used for SVD, the cross-scale matching column

(dark) will have less impact during the update, thus preservingthe scale uniqueness. In conclusion, although the algorithmenforces cross-scale matching for every atom, the learningprocess vary with respect to each atom’s scale-dependent

Fig. 3: Structure of matrix M lk. “Bright” columns are fidelity

terms, the “dark” column is the cross-scale similarity term.

properties.We comment a bit more about the choice of a bilinear

interpolator/decimator for the scaling operator Rlp. Since dic-tionary update is an iterative process, a high quality interpo-lation/decimation [30] [31] is unnecessary, and the bilinearoperator is sufficient and efficient for MD/CAC.

VI. EXPERIMENTS AND RESULTS

We carry out experiments on the Cross-scale Matching Pur-suit (CMP), the Cross-scale Cooperative Learning (MD/CCL),and the Cross-scale Atom Clustering (MD/CAC) algorithms,to evaluate the proposed multi-scale sparse coding and dictio-nary learning scheme.

A. Dictionary Sparse Coding RMSE

With the multi-scale structure and added sparsity across thesignal scales, we expect the proposed multi-scale dictionary torepresent images with less error than single-scale dictionaries,provided that coefficient cardinalities (number of non-zeroelements allowed for each coefficient vector) are identical.Comparisons have been carried out between several competingmethods:• KSVD: a single-scale dictionary of dimension 64×256

(patch size: 8×8, atome number: 256) trained with theKSVD algorithm [1];

• MD: a 2-scale Dictionary of dimension 64×128/144×128 (patch sizes: 8×8/ 12×12, atom numbers: 128/128, 256 in total), trained with the KSVD algorithm foreach scale separately without any cross-scale interactions;

• MD/CCL: a 2-scale Dictionary of dimension 64×128/144×128 (patch sizes: 8×8/ 12×12, atom numbers: 128/128, 256 in total), trained with the proposed Cross-scaleCooperative Learning method;

• MD/CAC: also 2-scale Dictionary of dimension 64×128/144×128 (patch sizes: 8×8/ 12×12, atom numbers: 128/128, 256 in total), trained with the proposed Cross-scaleAltom Clustering method.

For a fair comparison, the atom numbers of different dictio-naries have been set to be the same 256. Four dictionaries are

Page 6: Multi-Scale Dictionary Learning via Cross-Scale ... Sparse... · Multi-Scale Dictionary Learning via Cross-Scale Cooperative Learning and Atom Clustering for Visual Signal Processing

6

Fig. 4: Coding performance comparison in terms of RMSEbetween four different dictionaries. Data are the average offive images: “barbara”, “peppers”, “lena”, “boat” and “house”.

used to code images with different limits of cardinalities. ForKSVD, Orthogonal Matching Pursuit [28] is used for coding;for MD, MD/CCL and MD/CAC, CMP is used. The codingRoot Mean Square Errors (RMSE) are shown in Fig. 4, whichare averages of five input images: “barbara”, “peppers”, “lena”,“boat” and “house”; all images are of dimension 512×512.

Obviously from Fig. 4, the three 2-scale dictionaries allshow significant decrease of coding RMSE against the single-scale dictionary (in the scale of 2+), provided that dictionaryatom numbers as well as coding cardinalities are equiva-lent. This experiment validates our expectation that a multi-scale dictionary structure provides a more powerful framefor describing visual signals. Among the three multi-scalecounterparts, MD/CAC produces the best coding RMSE whencardinality gets larger.

B. Visual and Quantitative Evaluation of the Multi-scale Dic-tionary Learning Outcomes

1) Visual Illustration of MD/CCL and MD/CAC Learn-ing Outcome: Using a 2-scale DCT dictionary (dimension64×256 and 144×256) as the initial dictionary, and trainingpatches from the image “barbara” (overlapping patches of size8×8 and 12×12), we train the multi-scale dictionaries with thetwo methods herein proposed: MD/CCL and MD/CAC. Bothtraining processes iterate between CMP coding and dictionaryupdate for 10 iterations.

Fig. 5 shows the dictionary learning result for MD/CCL.Dictionary atoms from different scales (shown in Fig. 5(a)and (c) respectively) have visually similar patterns. Fig. 5(b)zooms in some atoms from the two scals that are related bythe interpolation operator Rlp. As can be seen, the larger scale(12× 12, in Fig. 5(b) bottom row) can depict the patterns inbetter detail than the smaller scale (8× 8, in Fig. 5(b) toprow); while smaller atoms can cooperate with larger atoms todepict similar image patterns in a slightly smaller resolution.

Fig. 5: 2-scale MD/CCL dictionary learned for the image“barbara”: (a) dictionary in scale 64×256, (c) dictionary inscale 144×256; (b) are the zoomed in part taken from (a) and(b) in red rectangles.

Fig. 6: 2-scale MD/CAC dictionary for the image “bar-bara”: (a) dictionary in scale 64×256, (b) dictionary in scale144×256; (c) and (d) are the zoomed in part taken from (a)and (b) in red dotted and solid rectangles respectively.

Fig. 6 shows the dictionary learning result for MD/CAC.The atom orders in the larger dictionary scale (12×12, in Fig.6(b)) have been rearranged to their corresponding matchedatoms’ positions in the smaller scale (8×8, in Fig. 6(a)) forbetter visual comparison. The effect of cross-scale clustering isobvious: Fig. 6(d) zooms in the six atoms in solid rectanglesin Fig. 6(a) and (b). These atoms have similar patterns, butare of different resolutions; Fig. 6(c) highlights the six atomsin dotted rectangles in Fig. 6(a) and (b). These atoms haveunique and scale-dependent patterns after dictionary training.

2) MD/CAC Atom Matching Property Evaluation: Theatom matching scheme proposed for MD/CAC is neitherinjective, nor surjective, i.e., multiple atoms from differentscales could choose the same atom as their match, and someatoms might not get chosen at all during the dictionary update.Experiment was carried on the image “barbara”, in which wecalculated how many different (kc, lc) values there were forall the atoms across the scales after each dictionary update,and the result is shown in Fig. 7. As a comparison, we

Page 7: Multi-Scale Dictionary Learning via Cross-Scale ... Sparse... · Multi-Scale Dictionary Learning via Cross-Scale Cooperative Learning and Atom Clustering for Visual Signal Processing

7

Fig. 7: Comparison on number of matched atoms acrossdifferent scales after each dictionary update iteration on theimage “barbara”.

also calculated the total number of (kc, lc) values when noclustering is implemented (λ = 0 in Eqn. (9)).

As can be seen from Fig. 7, for both occasions - with orwithout clustering - the number of matched atoms is highat iteration 1, due to the fact that the initial multi-scaleDCT dictionary has a diverse range of frequency variation.After several dictionary updates, atoms get more adapted toimage specific features. When this is done without cross-scaleinteraction, the matching number drops because natural imagesignals have a limited frequency band, multiple atoms fromone scale are easily matched to several popular low frequencyatoms in another scale, and therefore the decrease for the bluecurve in Fig. 7. However, the number remains high whencross-scale interaction is involved (shown as the red curvein Fig. 7). Pattern-specific similarities across the scales havebeen found and encouraged, which highlights the contributionof the proposed cross-scale matching scheme.

C. MD Parameter Analysis

1) Atom Numbers in Different Scales: One important pa-rameter of the multi-scale dictionary is the atom numbers indifferent scales. A natural expectation is that dictionary scaleswith higher resolution should have more atoms: since they bearmore information, and hold a higher redundancy threshold.Such reasoning is correct when different dictionary scales aretrained separately without cross-scale cooperation/interaction.However, since we are using the CMP coding method tochoose among atoms in different scales for a best fit for eachsignal, i.e., CMP judges which scale is better at the task. Inthis regard, we need to carry out an experiment to calculatethe atom selection ratio among each scales when CMP is usedfor sparse coding, and the result should have the final say onhow many atoms should be assigned to each scale.

Fig. 8 shows the experiment result for five different images.As can be seen from the figure, the smaller scale (8×8) hasa much higher frequency to be selected by the CMP than the

Fig. 8: Multi-scale dictionary scale selection frequency ratioduring CMP for different images. (a) uses a 2-scale dictionary(patch size 8×8, 12×12 for each scale), and (b) uses a 3-scaledictionary (patch size 8×8, 12×12 and 16×16 for each scale)

larger scales (12×12 for Fig. 8(a), (b), and 16×16 for Fig.8(b)). The frequency ratio is approximately 6:1 in favour ofthe smaller scales.

Based on this data, it is safe to conclude: when CMP isused for sparse coding, if the atom number is sufficient for asmaller scale to be redundant, then this atom number will beenough, if not too many, for the larger dictionary scale(s) tobe sufficiently redundant as well. This justifies the proposedmulti-scale dictionary structure, and such structure facilitatessparse coding as well as dictionary learning in return.

2) Total Atom Number Across the Scales: We have alreadyexplained the reason for choosing equal atom numbers indifferent MD scales in the last experiment. Next, we evaluatethe choice of total atom number across the scales.

Experiment has been carried out on the image “barbara”,in which we calculate the CMP coding RMSE using learnedMDs with different atom numbers. The RMSE are calculatedwith different specified coding cardinalities. The MDs are ofdimension: (64 ×K, 144 ×K), where K is the variable foratom number in each MD scale. All the MDs are trained usingthe proposed MD/CCL method. The result is shown in Fig. 9.

As can be seen from the figure, along the “Atom Num-ber” axis, after K = 256, the RMSE decrease rate willbecome much slower for most cardinality values along the“Cardinality” axis. Giving consideration to both representationefficiency and computational simplicity, we choose K = 256for each MD scale in our current implementation.

D. Image Denoising Method and Result

The most direct application of sparse representation is signaldenoising. In his section, we will describe how to implementimage denoising based on the proposed multi-scale dictionarystructure. Both quantitative and qualitative performance com-parison with other state-of-the-art image denoising methodswill be given.

Page 8: Multi-Scale Dictionary Learning via Cross-Scale ... Sparse... · Multi-Scale Dictionary Learning via Cross-Scale Cooperative Learning and Atom Clustering for Visual Signal Processing

8

Fig. 9: MD/CAC CMP coding performance analysis (in termsof RMSE) under different dictionary atom numbers and codingcardinalities.

1) The denoising method: Suppose a noise-free image isdivided into patches of L different scales, and then vectorizedas signals xli| xli ∈ RNl×1, l = 1, 2...L. Image patches fromdifferent scales with same index “i” share the same startingpixel. The image is corrupted with additive zero-mean whiteGaussian noise nl, with standard deviation σ:

yli = xli + nli. (13)

With the noisy observations yli| yli ∈ RNl×1, l = 1, 2...L,we aim to find the best estimate of the original signal.

The denoising process is largely similar to the CMP al-gorithm, however with a slight difference. A geometricalillustration of the multi-scale denoising process has beendepicted in Fig. 10. For an easier spatial perception, a 2-scale dictionary with each scale’s dimensionality: N1 = 2 andN2 = 3 is used. y1

i and y2i are the corrupted signals at the

same pixel location.Fuzzy spheres are created around the vectors y1

i and y2i with

diameters δl:δl = C ×

√Nl × σ, (14)

where σ is the noise intensity, and C is a constant. Ascompared to noise, image signal usually live in a lowerdimensional subspace. For this case, the subspace is either aline Ω1

Sj for N1 = 2; or a plane Ω2Sj for N2 = 3 (j = 1, 2, ...).

Both subspaces are spanned by a sparse set of atoms S fromD1 and D2 respectively in their corresponding dimensions.

The denoising algorithm uses “representation sparsity” and“small sparse coding error” as the two main priors for visualsignal estimation. For each scale, the denoising algorithmchooses the subspace S with least number of support ||S||0that first enters the fuzzy sphere. The noise-free signal xli isbelieved to lie within such sphere with high probability. Theprojection onto the chosen subspace ΩS is the current scale’sprediction for xli. The algorithm then chooses between scalesto represent the current signal y1

i , y2i by finding the smallest

target function value defined in Eqn. (3).

Fig. 10: A geometrical demonstration of the multi-scale codingprocess for application of image denoising.

After sparse coding, the final image will be reconstructedby calculating the average of all overlapped patches. As thescale at each patch location may vary, each pixel will beinvolved different times by neighboring patches which can beof different scales. A matrix will be maintained to record thisinformation for each pixel during the sparse coding phase,which will later be used as weighting coefficient during thefinal averaging.

2) The performance: We will use the aforementionedmethod to perform image denoising over different types ofsingle-scale and multi-scale dictionaries to compare their per-formances.

First, we evaluate the advantage the proposed cross-scalelearning methods can provide when a multi-scale dictionaryis used for image denoising. For that purpose, a multi-scaledictionary trained separately for each scale without any cross-scale interaction is used as reference. The result is shownin Fig. 11. The three different 2-scale dictionaries listed forcomparison are of the same dimension (dimension 64×256/144×256), and the PSNRs shown are the average of denoisingresult on six different images: “barbara”, “peppers”, “lena”,“boat”, “house” and “straw”. As can be seen in Fig. 11, thedenoising performance increases by approximately 0.2+ dB onaverage for all noise levels when the multi-scale dictionary istrained with cross-scale interaction. To be specific, MD/CACshows better performance with low noise levels (σ = 5),MD/CCL performs better when noise level increases. Thisresult can be understood this way: MD/CAC shows advantagefor low noise levels as it is more flexible and is better atpreserving cross-scale similarity and uniqueness at the sametime, but only when noise is relatively low. When noiseincreases and becomes dominant, scale uniqueness will de-crease because added noise is scale irrelevant, and cross-scalesimilarity becomes more difficult to capture. As MD/CCLfixes atoms’ cross scale atom matching relations, and learnsthe atoms in a cooperative manner, i.e., signal from differentscales all contribute to the learning outcome of the same atom,hence it is more resilient to larger noise. In spite of theirdifferent advantages for different noise levels, one conclusion

Page 9: Multi-Scale Dictionary Learning via Cross-Scale ... Sparse... · Multi-Scale Dictionary Learning via Cross-Scale Cooperative Learning and Atom Clustering for Visual Signal Processing

9

Fig. 11: Difference of denoising results between 2-scale (8×8/12×12) MD/CCL, and MD/CAC with MD (no cooperativelearning or cross-scale clustering applied) as reference. Re-sults are average of six images: “barbara”, “peppers”, “lena”,“boat”, “house” and “straw”.

can be drawn: multi-scale dictionaries learned via cross-scale cooperation/interaction show significant advantage in theapplication of image denoising over multi-scale dictionariestrained separately for each scale. The pursuit for cross-scalesimilarity and cross-scale sparsity is therefore justified.

Second, we evaluate the advantage of a multi-scale dic-tionary when used for image denoising over a single-scaledictionary. The result is shown in Fig. 12, where a single-scaleKSVD dictionary (of dimension 64×256) is used as reference.The data in Fig. 12 are also the averages of the same six inputimages used in Fig. 11. A complete table of performance datais listed in Table 1, where an additional entry of single-scaleKSVD dictionary (dimension 144×256) is also involved forbetter comparison.

As shown in the figure and the table, multi-scale dictionariesMD/CCL and MD/CAC show an average of approximately0.2 dB advantage in denoising performances over single-scale KSVD trained dictionary. Notably, the 2-scale dictionarytrained without cross-scale interaction (MD in Fig. 12) actuallyperforms poorer than its single-scale counterpart. This couldbe explained by the fact that: since MD shows overwhelminglybetter efficiency in signal representation (already validated inprevious experiment in Fig. 4), such capability could extend tonoise representation as well. Therefore, the denoising perfor-mance of a multi-scale dictionary learned without cross-scaleinteraction is actually worse than a single-scale dictionary; thisin a way further shows the necessity and effectiveness of thecross-scale interaction schemes herein proposed for MD/CCLand MD/CAC.

We also include the result from [25] for comparison. In [25]the authors proposed several additional algorithmic improve-ments during the process of sparse coding (imposed preferencefor the DC component, OMP stopping criteria, etc.). In order tofocus only on the advantage brought about by the multi-scalescheme, we have carefully re-implemented their algorithmwithout the additional improvements. The patch sizes for each

Fig. 12: Difference of denoising results between 2-scale (8×8/12 × 12) MD/CCL, MD/CAC and MD, with single scaleKSVD (8×8) as reference. Results are average of six images:“barbara”, “peppers”, “lena”, “boat”, “house” and “straw”.

noise level are: 10×10 for σ = 5; 12×12 for σ = 10; 16×16for σ = 15, 20 and 25; 20×20 for σ = 50, respectively. Theresults are listed in Table. I. It can be seen from the table thatthe proposed multi-scale scheme outperforms the method in[25], except for a few cases at larger noise levels (σ = 50).

Visual comparison of these competing denoising methodsare provided in Fig. 13 and Fig. 14. Fig. 13 shows the resultfor the image “straw” contaminated with gaussian noise σ =15. The image contains an enormous amount of cross-scalepatterns. As can be seen, the proposed multi-scale methods inFig. 13(e) and (f) preserve the straw edges and texture detailsbetter than other methods. Fig. 14 gives another visual exampleon the image “barbara” contaminated with gaussian noise σ =15. The cross-scale mechanism introduced herein also helpsto remove noise in textureless parts of the image better thanthe other methods, as can be seen from the zoomed in part ofthe wall in the background.

VII. CONCLUSION: SIMILARITY OR UNIQUENESS

In this paper, we proposed a multi-scale dictionary structure,a multi-scale sparse coding algorithm (CMP), and two differ-ent types of cross-scale dictionary learning methods: cross-scale cooperative learning (MD/CCL), and cross-scale atomclustering (MD/CAC). Our aim is to study the cross-scale in-teractions of a multi-scale dictionary, and its applications. Theproposed multi-scale dictionary, along with the CMP codingalgorithm, show better performance in sparse representationover single-scale dictionaries. Experiments also showed itsadvantage in image denoising.

The focus of this paper is on the cross-scale interactionof the multi-scale dictionary. MD/CCL is better at enforcingcross-scale similarity because of its fixed atom matchingrelations; MD/CAC is more flexible in representing scale-unique patterns due to its re-calculation of clustering centersduring each dictionary update. The two attributes: similarity

Page 10: Multi-Scale Dictionary Learning via Cross-Scale ... Sparse... · Multi-Scale Dictionary Learning via Cross-Scale Cooperative Learning and Atom Clustering for Visual Signal Processing

10

TABLE I: Denoising result for six test images with added zero mean white gaussian noise of different intensities rangingfrom 5 to 50. The six data listed for each table entry are results from: top row from left to right, (1) single-scale KSVDdictionary (dimension 64×256); (2) single-scale KSVD dictionary (dimension 144×256); (3) 2-scale dictionary trained byKSVD separately each scale; bottom row from left to right, (4) 2-scale dictionary proposed in [25], (5) 2-scale dictionarytrained by MD/CCL; (6) 2-scale dictionary trained by MD/CAC. Best performance are in bold.

σ/PSNR 5 /34.15 10 /28.13 15 /24.61 20 /22.11 25 /20.17 50 /14.15

barbara38.02 36.80 38.09 34.59 34.14 34.53 32.37 32.28 32.39 30.87 30.90 30.81 29.60 29.68 29.52 25.47 26.39 25.58

38.19 38.22 38.21 34.62 34.70 34.60 32.47 32.61 32.50 30.95 31.01 30.93 29.94 29.76 29.62 26.53 26.12 25.78

peppers37.98 36.91 38.00 35.01 34.86 34.96 33.50 33.29 33.47 32.39 32.16 32.34 31.48 31.36 31.47 28.22 28.50 28.25

38.06 38.02 38.07 35.12 35.03 35.10 33.51 33.60 33.57 32.37 32.46 32.42 31.49 31.59 31.56 28.38 28.47 28.37

lena38.54 37.68 38.58 35.49 34.91 35.42 33.67 33.40 33.52 32.34 32.24 32.24 31.32 31.29 31.22 27.81 28.12 27.69

38.60 38.61 38.62 35.58 35.64 35.51 33.71 33.81 33.65 32.46 32.52 32.35 31.48 31.45 31.30 28.24 28.18 28.02

boat37.04 35.66 37.13 33.68 33.25 33.65 31.78 31.48 31.72 30.36 30.38 30.29 29.28 29.12 29.17 26.01 26.14 25.93

37.31 37.10 37.17 33.71 33.74 33.69 31.67 31.87 31.77 30.37 30.50 30.38 29.27 29.35 29.23 26.21 26.33 26.07

house43.27 43.27 42.71 39.11 39.40 38.73 36.90 37.41 36.90 35.70 35.96 35.53 34.51 34.70 34.49 30.75 31.49 30.93

42.96 43.37 43.40 39.09 39.40 39.44 37.22 37.45 37.52 35.81 36.14 36.16 34.77 34.90 34.93 31.68 31.53 31.39

straw30.89 24.71 30.09 28.38 24.26 28.03 26.09 23.80 26.18 24.43 23.10 24.49 23.20 22.34 23.29 19.83 19.32 19.81

31.30 32.11 32.13 28.29 29.06 28.64 24.01 26.50 26.20 23.49 24.71 24.40 22.81 23.34 23.10 19.39 19.80 19.53

average37.62 35.84 37.43 34.38 33.47 34.22 32.39 31.94 32.36 31.02 30.79 30.95 29.91 29.79 29.86 26.35 26.66 26.36

37.74 37.91 37.93 34.40 34.60 34.50 32.10 32.64 32.54 30.91 31.22 31.11 29.95 30.06 29.96 26.74 26.74 26.51

Fig. 13: Denoising result on image “straw” with gaussian noise σ = 15. (a) is the noisy image, the part in red rectangle is zoomedin from (b) to (f), which are the denoising result using (b) K-SVD (64× 256, 26.09dB), (c) K-SVD (144× 256, 23.80dB),(d) 2-scale dictionary proposed in [25] (24.01dB), (e) 2-scale MD/CCL (26.50dB) and (f) 2-scale MD/CAC (26.20dB).

Page 11: Multi-Scale Dictionary Learning via Cross-Scale ... Sparse... · Multi-Scale Dictionary Learning via Cross-Scale Cooperative Learning and Atom Clustering for Visual Signal Processing

11

Fig. 14: Denoising result on image “barbara” with gaussian noise σ = 15. (a) noisy image, the part in red rectangle is zoomedin from (b) to (f), which are the denoising result using (b) K-SVD (64× 256, 32.37dB), (c) K-SVD (144× 256, 32.28dB),(d) 2-scale dictionary proposed in [25] (32.47dB), (e) 2-scale MD/CCL (32.61dB) and (f) 2-scale MD/CAC (32.50dB).

and uniqueness that characterize the two proposed cross-scaledictionary learning methods, showed their respective advan-tages when used in image denoising applications at differentnoise levels. The compromise between the two attributes in amulti-scale dictionary is vital not just in image denoising, butalso in many other multi-scale scenarios.

In conclusion, the multi-scale dictionaries herein proposedare advantageous over sigle-scale dictionaries in sparse codingand signal denoising, and the cross-scale interaction duringdictionary learning is vital in exploiting such advantages.

REFERENCES

[1] M. Aharon, M. Elad, and A. Bruckstein, “K-svd: An algorithm fordesigning overcomplete dictionaries for sparse representation,” IEEETransactions on Signal Processing, vol. 54, no. 11, 2006.

[2] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decompositionby basis pursuit,” SIAM journal on scientific computing, vol. 20, no. 1,pp. 33–61, 1998.

[3] R. Rubinstein, A. M. Bruckstein, and M. Elad, “Dictionaries for sparserepresentation modeling,” IEEE Proceedings: Special Issue on Applica-tions of Sparse Representation and Compressive Sensing, vol. 98, no. 6,2010.

[4] S. G. Mallat, A wavelet tour of signal processing: the Sparse way.Amsterdam ; Boston : Elsevier /Academic Press., 2009.

[5] I. Daubechies, Ten Lectures on Wavelets. Society for Industrial andApplied Mathematics, 1992.

[6] A. Pizurica, W. Philips, I. Lemahieu, and M. Acheroy, “A joint inter-and intra-scale statistical model for bayesian wavelet based imagedenoising,” IEEE Transactions on Image Processing, vol. 11, no. 5,2002.

[7] L. Sendur and I. Selesnick, “Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency,” IEEE Transactionson Signal Processing, vol. 50, no. 11, 2002.

[8] J. Portilla, V. Strela, M. Wainwright, and E. Simoncelli, “Image denois-ing using scale mixtures of gaussians in the wavelet domain,” IEEETransactions on Image Processing, vol. 12, no. 11, 2003.

[9] A. Pizurica and W. Philips, “Estimating the probability of the presenceof a signal of interest in multiresolution single- and multiband imagedenoising,” IEEE Transactions on Image Processing, vol. 15, no. 3,2006.

[10] G. Peyre, “A review of adaptive image representations,” Selected Topicsin Signal Processing, IEEE Journal of, vol. 5, no. 5, 2011.

[11] M. Elad, R. Goldenberg, and R. Kimmel, “Low bit-rate compression offacial images,” IEEE Transactions on Image Processing, vol. 16, no. 9,2007.

[12] J. Hou, L. Chau, Y. He, D. T. P. Quynh, and N. Magnenat-Thalmann,“Dynamic 3-d facial compression using low rank and sparse decompo-sition,” in ACM SIGGRAPH Asia, Singapore, 2012.

[13] M. Elad and M. Aharon, “Image denoising via sparse and redundantrepresentations over learned dictionaries,” IEEE Transactions on ImageProcessing, vol. 15, no. 12, 2006.

[14] J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. Huang, “Coupled dictio-nary training for image super-resolution,” IEEE Transactions on ImageProcessing, vol. 21, no. 8, pp. 3467–3478, 2012.

[15] J. Hou, L. Chau, Y. He, and N. Magnenat-Thalmann, “Human motioncapture data recovery via trajectory-based sparse representation,” inIEEE International Conference on Image Processing, 2013.

[16] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust facerecognition via sparse representation,” IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.

[17] Y. Wu, B. Ma, M. Yang, Y. Jia, and J. Zhang, “Metric learningbased structural appearance model for robust visual tracking,” IEEETransactions on Circuits and Systems for Video Technology, vol. 24,no. 5, 2014.

[18] C. H. Chan and J. Kittler, “Sparse representation of (multiscale) his-

Page 12: Multi-Scale Dictionary Learning via Cross-Scale ... Sparse... · Multi-Scale Dictionary Learning via Cross-Scale Cooperative Learning and Atom Clustering for Visual Signal Processing

12

tograms for face recognition robust to registration and illuminationproblems,” in IEEE International Conference on Image Processing,.IEEE, 2010, pp. 2441–2444.

[19] F. Zhu and L. Shao, “Weakly-supervised cross-domain dictionary learn-ing for visual recognition,” International Journal of Computer Vision,vol. 109, no. 1-2, pp. 42–59, 2014.

[20] B. Ophir, M. Lustig, and M. Elad, “Multi-scale dictionary learning usingwavelets,” IEEE Journal of Selected Topics in Signal Processing, vol. 5,2011.

[21] R. Yan, L. Shao, and Y. Liu, “Nonlocal hierarchical dictionary learningusing wavelets for image denoising,” IEEE Transactions on ImageProcessing, vol. 22, no. 12, pp. 4689–4698, Dec 2013.

[22] K. Skretting and K. Engan, “Image compression using learned dic-tionaries by rls-dla and compared with k-svd,” in IEEE InternationalConference on Acoustics, Speech and Signal Processing. IEEE, 2011.

[23] J. Hughes, D. Rockmore, and Y. Wang, “Bayesian learning of sparsemultiscale image representations,” IEEE Transactions on Image Pro-cessing, vol. 22, no. 12, 2013.

[24] Q. Liu, J. Luo, S. Wang, M. Xiao, and M. Ye, “An augmentedlagrangian multi-scale dictionary learning algorithm,” EURASIP Journalon Advances in Signal Processing, vol. 2011, no. 1, 2011.

[25] J. Mairal, G. Sapiro, and M. Elad, “Multiscale sparse image represen-tationwith learned dictionaries,” in IEEE International Conference onImage Processing, vol. 3, 2007.

[26] ——, “Learning multiscale sparse representations for image and videorestoration,” DTIC Document, Tech. Rep., 2007.

[27] M. Aharon and M. Elad, “Sparse and redundant modeling of imagecontent using an image-signature-dictionary,” SIAM Journal on ImagingSciences, vol. 1, no. 3, 2008.

[28] Y. Pati, R. Rezaiifar, and P. Krishnaprasad, “Orthogonal matchingpursuit: recursive function approximation with applications to waveletdecomposition,” in Asilomar Conference on Signals, Systems and Com-puters, nov 1993.

[29] D. P. Bertsekas, “Nonlinear programming,” 1999.[30] N. Kulkarni, P. Nagesh, R. Gowda, and B. Li, “Understanding com-

pressive sensing and sparse representation-based super-resolution,” IEEETransactions on Circuits and Systems for Video Technology, vol. 22,no. 5, pp. 778–789, 2012.

[31] X. Lu, Y. Yuan, and P. Yan, “Image super-resolution via doublesparsity regularized manifold learning,” IEEE transactions on circuitsand systems for video technology, vol. 23, no. 12, 2013.

Jie Chen received the B.S. and M. Eng degreefrom School of Optical and Electronic Informa-tion, Huazhong University of Science and Technol-ogy, China. He is currently pursuing Ph.D. degreein School of Electrical & Electronic Engineering,Nanyang Technological University, Singapore.

His research interests are in image processing (lowcontrast image processing, denoising, interpolation),image sparse representation and applications, andcomputational photography.

Lap-Pui Chau received the B. Eng degree withfirst class honours in Electronic Engineering fromOxford Brookes University, England, and the Ph.D.degree in Electronic Engineering from Hong KongPolytechnic University, Hong Kong, in 1992 and1997, respectively. In June 1996, he joined TritechMicroelectronics as a senior engineer. Since March1997, he joined Centre for Signal Processing, anational research centre in Nanyang TechnologicalUniversity as a research fellow, subsequently hejoined School of Electrical & Electronic Engineer-

ing, Nanyang Technological University as an assistant professor and currently,he is an associate professor. He is a Technical Program Co-Chairs for VisualCommunications and Image Processing (VCIP 2013) and 2010 InternationalSymposium on Intelligent Signal Processing and Communications Systems(ISPACS 2010). He was the chair of Technical Committee on Circuits &Systems for Communications (TC-CASC) of IEEE Circuits and SystemsSociety from 2010 to 2012. He served as an associate editor for IEEETransactions on Multimedia, IEEE Signal Processing Letters, and is currentlyserving as an associate editor for IEEE Transactions on Circuits and Systemsfor Video Technology, IEEE Transactions on Broadcasting and IEEE Circuitsand Systems Society Newsletter. Besides, he is IEEE Distinguished Lecturerfor 2009-2013, and a steering committee member of IEEE Transactions forMobile Computing from 2011-2013.