1
Tree-based Algorithms for Compressed Sensing
with Sparse-Tree PriorChinh N. H. La and Minh N. Do
Abstract
Recent studies have shown that sparse representation can beused effectively as a prior in linear
inverse problems. However, in many multiscale bases (e.g. wavelets), signals of interest (e.g. piecewise-
smooth signals) not only have few significant coefficients, but also those significant coefficients are
well-organized in trees. We propose to exploit this prior, namedsparse-tree, for linear inverse problems
with limited numbers of measurements. Toward this end, we present two efficient and effective algorithms
named Tree-based Orthogonal Matching Pursuit (TOMP) and Tree-based Majorization-Minimization
(TMM). Numerical results show that tree-based algorithms provide significantly better reconstruction
quality compared to methods relying only on the sparse prior.
Index Terms
compressed sensing, linear inverse problems, sparse representations, sparse-tree prior, tree struc-
tures, wavelets, greedy algorithms, tree-based orthogonal matching pursuit, tree-based majorization-
minimization.
I. INTRODUCTION
A recent series of remarkable works have shown that thesparse representationof an unknown signal
in a certain basis can be used effectively as prior knowledgefor signal reconstruction from a limited
number of measurements [1], [2], [3], [4], [5], [6]. An unknown signal of lengthN is said to have sparse
C. N. H. La was with the Department of Electrical and ComputerEngineering, University of Illinois at Urbana-Champaign,
Urbana IL 61801. He is now with Intel. (email: [email protected]).
M. N. Do is with the Department of Electrical and Computer Engineering, the Coordinated Science Laboratory, and the
Beckman Institute, University of Illinois at Urbana-Champaign, Urbana IL 61801 (email: [email protected]).
This work was supported by the US National Science Foundation under Grant CCF 06-35234 and a Vietnam Education
Foundation Fellowship.
March 31, 2009 DRAFT
2
representation if it can be well approximated by onlyK significant coefficients whereK ≪ N in a
fixed basis. Significant coefficients are defined as those withmagnitude greater than a certain threshold.
However, the locations of theseK significant coefficients are unknown and depend on the signal. It has
been shown in [3], [4], [5], [6] that under certain conditions, with high probability, the unknown signal
can be reconstructed accurately using onlyO(K log N) nonadaptive linear measurements. These new
results could have great impact in a wide range of inverse problems from remote sensing to medical
imaging where, due to physical constraints, only a limited number of measurements can be acquired
from the unknown object.
Figure 1 (a) and (b) demonstrate the power of the sparse representations via accurate reconstruction
of a lengthN = 256 signal by keeping onlyK = 32 most significant wavelet coefficients.
50 100 150 200 250−30−20−10
010203040
Original (M = 256)
(a)
50 100 150 200 250−30−20−10
010203040
Reconstructed from K = 32 coefficients
(b)
21
22
23
24
25
Wavelet coefficients
(c)
Fig. 1. Example of sparse representation: (a) original signal with N = 256 samples; (b) wavelet approximation usingK = 32
coefficients; (c) wavelet coefficients.
However, in many multiscale bases (e.g., wavelets), objects of interest (e.g., piecewise-smooth signals)
have significant coefficients that arenot only few in number, but also well-organized in trees. Figure 1 (c)
shows an example of these cases. In fact, the embedded tree structures for significant wavelet coeffi-
cients have been used successfully in compression [7], [8],modeling [9], [10], and approximation [11].
Intuitively, a general sparse representation withK coefficients can be described with2K numbers:K
for the values of the significant coefficients and anotherK for the indexing of these coefficients. If the
K significant coefficients are known to be organized in trees, then the indexing cost and hence the total
DRAFT March 31, 2009
3
description of the unknown signal can be significantly reduced.
We propose to exploit thesparse-tree representationas additional prior information for signal recon-
struction with a limited number of measurements. Exploiting this embedded tree structure in addition
to the sparse prior in inverse problems would potentially lead to: (1) better reconstructed signals, (2)
reconstruction using fewer measurements, and (3) faster reconstruction algorithms.
In [12], [13] we primitively presented the Tree-based Orthogonal Matching Pursuit (TOMP) algorithm
for signal reconstructions using sparse-tree prior. The idea of using sparse-tree representation for signal
reconstruction was also independently developed in [14] with the main focus on obtaining fast recon-
struction algorithms. Our focus is to exploit the sparse-tree representation to obtain better reconstruction
compared to existing methods like Orthogonal Matching Pursuit (OMP) [6], Basis Pursuit (BP) [4], [5]
and iterated shrinkage or Majorization-Minimization (MM)[15], [16], [17] that only use the sparse prior.
In [18] we extended the MM algorithm to exploit sparse-tree prior leading to the Tree-based Majorization-
Minimization (TMM) algorithm. A theoretical analysis of model-based compressive sensing including
sparse-tree prior was recently developed in [19].
The outline of the paper is as follows. We review the existingsparse inverse problem and compressed
sensing reconstruction algorithms in Section II. We formulate thesparse-tree inverse problemin Section
III. The TOMP algorithm is presented in Section IV and the TMMalgorithm is presented in Section V.
Numerical experiments in Section VI show the superior performance of tree-based modified algorithms
over the original ones. We conclude the paper in Section VII.
II. BACKGROUND
A. Compressed Sensing and Sparse Inverse Problem
For an unknown signaly of length N , suppose that we can only acquire a limited numberM of
non-adaptive linear measurements(M ≪ N ):
My = b, (1)
whereM is a fixedM ×N measurement matrix, andb is a length-M vector that contains the measured
data. The inverse problem is to reconstruct signaly from b.
Suppose thaty has asparse expansionvia a fixedN × N transform matrixW as
y = Wx, (2)
March 31, 2009 DRAFT
4
where at mostK out of N entries (K ≪ N ) of x is non-zero or significant. LetA = MW which is
also a knownM × N matrix, then the inverse problem (1) becomes
Ax = b, (3)
and substitutingx to (2) will give y.
SinceM ≪ N , both (1) and (3) are underdetermined systems. Thus in orderto solve this inverse
problem we need to exploit sparse prior ofx. Intuitively, we can recoverx from a fewer number of
measurements since most of the coefficients are insignificant and can be discarded. The sparsity ofx is
measured by its number of nonzero coefficients:
S(x) = ‖x‖0 = size{i : xi 6= 0}. (4)
To solve (3), current methods use the sparsity prior and aim to search for thesparsest solution:
minx
S(x) subject to Ax = b. (5)
We refer to (5) as thesparse inverse problem.
B. Existing Reconstruction Algorithms
Since (5) is an NP-hard problem [20], one approach is to use agreedy searchmethod such as
Orthogonal Matching Pursuit(OMP) [20], [6]. In the OMP method, the measurement matrixA is
considered as a dictionary with columnsai’s of A as atoms. Through out the paper, we assume thatai are
normalized such that‖ai‖2 = 1. Each OMP iteration searches the dictionary for an atom corresponding
to a significant coefficient inx and estimates the value of this coefficient through the mean of orthogonal
projections. Specifically, starting withr0 = b, at thek-th iteration the OMP algorithm selects thek-th
atom as
ik = arg max1≤i≤N
|〈rk−1,ai〉|, (6)
and updates the residual as
rk = b − Pspan{ai1,ai2 ,...,aik}
b. (7)
wherePS denotes the orthogonal projection onto a subspaceS.
There are several improved versions of OMP, including StOMP[21], CoSaMP [22], and Subspace
Pursuit [23], where each iteration selects more than one atom and could include a backtracking or
pruning step.
DRAFT March 31, 2009
5
Alternative to OMP, theBasis Pursuitor l1-minimization approach [24], [3], [4], [5] relaxes thel0-norm
in the problem (5) by the convexl1-norm; that is it solves
xBP = arg minx
‖x‖1 subject to Ax = b. (8)
The l1-minimization problem (8) can be solved by linear programing or interior-point methods.
Finally, there is a class of algorithms callediterated shrinkageor Majorization-Minimization[15], [16],
[17] that solve the equivalent regularization problem
x = arg minx
‖Ax − b‖22 + µ‖x‖1, (9)
through iterations in which each consists of two steps:
z = x(t) + λ−1AT (b − Ax(t)), (10)
x(t+1)n = sign(zn) · max{0, |zn| − µ/λ}, (11)
whereλ is the maximum eigenvalue ofATA. The first step is a gradient descent for solvingAx = b,
while the second step is a componentwise soft thresholding for enforcing sparsity.
III. SPARSE-TREE INVERSE PROBLEM
A. The Sparse-Tree Characteristic of Signals
If the transform in (2) is anL-level 1-D wavelet transform then the entries ofx consist of{
{x(s)L,p}1≤p≤M/2L , {x
(w)l,p }1≤l≤L, 1≤p≤M/2l
}
,
where x(s)L,p are the scaling coefficients andx(w)
l,p are the wavelet coefficients. These coefficients are
arranged in atree structure[25]. In the notationx(w)l,p , l is the level andp is the index in that level. Hence
the vectorx resulting from the 1-D wavelet transform can be arranged a binary tree as in Figure 2. A
wavelet coefficientx(w)l,p has two childrenx(w)
l−1,2p−1 andx(w)l−1,2p. Thus, the entries inx can be specified
either in the vector formxi or in the tree-structured formxl,p. The binary tree structure of 1-D wavelet
transform can be extended to multi-dimensional wavelet transforms such as quad-tree for images in 2-D.
Examining the wavelet representationx of a piecewise-smooth signal, as in Figure 1(c), we identify
the following two distinguish properties that together form thespare-treeprior:
P1 Vectorx has sparse structure; i.e. only few entries inx are nonzero or significant.
P2 Those nonzero or significant entries ofx are well connected in a tree structure (see below).
P1 is the common used prior in compressed sensing as discussed in Section II-A. However,P2 is an
important additional prior that has not been considered by current inverse problems (5), (8), (9). Property
March 31, 2009 DRAFT
6
White node: insignificant coefficient
Level 3
Level 2
Level 1
Level 0
Wavelet
Scalingcoeff.
coeff.
Black node: significant coefficient
Fig. 2. The tree structures of an example 1-D wavelet coefficient set: Black nodes represent significant coefficients; white
nodes represent insignificant coefficients.
P2 holds because significant wavelet coefficients inx correspond to discontinuities in the original signal
y [25], and each discontinuity generates a set of non-zero wavelet coefficients in a “cone of influence”
referred to as wavelet footprint [26]. In particular,if a coefficient is nonzero or significant then its ancestors
are likely nonzero or significant. As mentioned before, a coefficient is considered significant when its
magnitude is bigger than a certain thresholdǫ0. If the signal is strictly sparse we can letǫ0 = 0 meaning
nonzero entries are significant. PropertyP2 implies that if eitherx(w)l−1,2p−1 or x
(w)l−1,2p is significant, then
xwl,p is likely significant. Therefore, the significant coefficients of x themselves form aconnected treeas
illustrated in Figure 2.
B. The Sparse-Tree Inverse Problem
We denoteH(i) the index set of thehistory nodes of the nodei, that is all nodes in the tree branch
from the nodei up to the root level. IfI is an index set thenxI denotes the set of entries inx with
indexes fromI:
xI = {xi : xi ∈ x, i ∈ I}. (12)
Let T (x) be the smallest connected tree that contains indexes of all the nonzero coefficients ofx:
T (x) =⋃
xi 6=0
H(i). (13)
Like (4) the size of treeT (x) is measured by
ST (x) = size(T (x)). (14)
Compare with the sparse measure (4), we see that if nonzero entries ofx are connected in a tree then
ST (x) = S(x). Otherwise,ST (x) ≫ S(x).
DRAFT March 31, 2009
7
While current inverse problems rely only on the spare priorP1, we realize that the tree characteristic
P2 offers an additional important prior. This propertyP2 lead us to replace the termS(x) in (5) by
ST (x) and form the following problem:
minx
ST (x) subject to Ax = b. (15)
The objective of (15) is to find the smallest connected tree containing all the nonzero entries ofx that
solvesAx = b. We refer to this problem as thesparse-tree inverse problemin connection to thesparse
inverse problemin (5).
Problem (15) is applicable to compressible signals that have well connected significant entries in a
certain domain. The most typical and representative cases are piecewise-smooth signals in the wavelet
domain. Problem (15) can also be applied to signals that havestructures approximate connected trees as
will be seen later in numerical experiments.
C. Justify Sparse-Tree Prior using Coding Argument
The termST (x) represents the indexing cost ofx using sparse-tree structure. A major component in
the complexity (for example, in coding) of a sparse vectorx is the indexing cost for the locations of its
nonzero entries. In fact, the main goal of current reconstruction algorithms in compressed sensing is to
recover the locations of these nonzero entries.
With the sparse prior alone, for a lengthN vectorx that hasS(x) nonzero entries, we need log2N
bits to specify the location of each nonzero entries. Hence,the total indexing cost under the sparse prior
is
IndexingCostsparse(x) = S(x) · log2 N. (16)
With the sparse-tree prior, like in practical wavelet-based coding schemes, we can efficiently specify
nonzero entries along a tree, in which each nonzero entry canbe accessed through its ancestors [7]. For
example, for the 1-D wavelet transform as can be seen in Figure 2, we need only two bits for each
nonzero coefficient to code four possibilities that this coefficient has{no child, one child on the left, one
child on the right, or two children} that are nonzero. With tree-based indexing, to specify the locations
of all non-zero coefficients ofx we need to expand to the smallest connected treeT (x) as defined in
(13). Thus, the total indexing cost under the sparse-tree prior is
IndexingCostsparse-tree= ST (x) · 2. (17)
The indexing cost with sparse-tree prior depends onST (x) and the role ofT (x). If the nonzero entries
of x are well-connected on a tree thenST (x) = S(x), and hence the indexing cost with sparse-tree prior
March 31, 2009 DRAFT
8
(17) is significantly smaller than the indexing cost with sparse prior (16). This significant reduction in
indexing cost motivates our formulation of the spare-tree inverse problem (15). It can also be motivated
from the minimum description length principle [27].
IV. T REE-BASED ORTHOGONAL MATCHING PURSUIT
A. Algorithm Description
All methods discussed in Section II-B only exploit the sparsity propertyP1 of a signalx in the problem
Ax = b. We now present an improved version of OMP, namedTree-Based Orthogonal Matching Pursuit
(TOMP), that additionally exploits the tree propertyP2 of x.
The inputs of TOMP are anM × N measurement matrixA where M ≪ N , a data vectorb of
lengthM and two tuning coefficients: therelaxing coefficientα (α ∈ [0, 1]) and thedownward extending
coefficientd (d ∈ Z, d ≥ 1) as will be explained later. TOMP returns a reconstruction vectorx of length
M which is the solution of (15) and has sparse tree structure.
The key idea of TOMP is to greedily select from a limited search space ofA sets of columns
corresponding to branches of the treeT (x) defined in (13). LetΛk be the accumulatedselected set
which stores indexes of the selected columnsai in A from the beginning until iteration stepk. The
initial selected setΛ1 consists of indexes of entries which are known to be significant and at the roots of
the tree. For example, in the case of wavelet transformation, all scaling coefficients are significant and
Λ1 = {indexes of all scaling coefficients}.
And the initial residual is
r1 = b − Pspan{ai : i∈Λ1}b. (18)
For each iteration stepk (k = 2, 3, . . .), the TOMP algorithm first forms acandidate setCk that is
restricted tod-level descendants of the already selected nodes
Ck =⋃
i∈Λk−1
Dd(i), (19)
whereDd(i) denotes all descendants ofi within d levels on the tree. Henced is named thedownward
extending coefficient.
The TOMP algorithm then evaluates the inner products of the residualrk−1 with atomsai in Ck and
selects atoms with largest inner products as thefinalist setF k:
F k = {i : i ∈ Ck s.t. |〈rk−1,ai〉| ≥ α maxj∈Ck
|〈rk−1,aj〉|}, (20)
DRAFT March 31, 2009
9
whereα is a given relaxing coefficient.
The next selected entries are determinedbased on the whole historyH(i)
ik = arg mini∈Fk
‖b − Pspan{aj : j∈Λk−1∪H(i)}b‖2. (21)
Finally, the selected set and residual are updated as
Λk = Λk−1 ∪ H(ik), (22)
rk = b − Pspan{ai : i∈Λk}b. (23)
Steps (19) to (23) are repeated for each iteration until we encounter a stopping criterion. One possible
stopping criterion is when the number of selected items inΛk reaches a predefined portion (e.g., half) of
the number of rows inA. Intuitively, we cannot expect to recover a signal with moredegrees of freedom
than the number of measurements. Another stopping rule is‖rk‖22 ≤ ε. Whenx is not exactly sparse,
ε can be an optionally selected limit with very small value to eliminate the insignificant coefficients,
as a means of lossy compression. In the noisy case, the value of ε is based on the noise level in the
measurement data. For example, whenb is affected by an additive white Gaussian noise of varianceσ2,
ε can be chosen asNσ2, the noise energy, whereN is the length ofb.
After the last TOMP iteration, nonzero coefficients of the estimated signalx are indexed byΛk and
solved by∑
λ∈Λk
xλaλ = Pspan{ai : i∈Λk}b. (24)
B. Comparison between TOMP and OMP
Since TOMP selects entries by expanding a selection tree, the final selected set is a connected sparse-
tree. Moreover, only tree branches that lead to the smallestresidual via orthogonal projection are selected
at each iteration.
Equations (19), (21), and (22) mark the major differences between OMP and our proposed TOMP. A
slight modification of the OMP iteration (6)-(7) can be expressed as
ik = arg min1≤i≤N
‖b − Pspan{aj : j∈{i1,i2,...,ik−1}∪{i}}b‖2. (25)
Thus, comparing (25) with (21) we see that at each iteration OMP expands the selected set by one
entry that minimizes the resulting residual. Whereas TOMP expands by a group (a tree branch) of nodes.
In the formulation (15) if a nodei is in the set of significant coefficients then so are all nodes in H(i).
Hence, TOMP is more robust to noise and the limitedness of thenumber of measurements. In addition,
March 31, 2009 DRAFT
10
in each iteration TOMP adds a group of entries into the selected setΛk, thus TOMPS requires fewer
iterations than OMP.
The candidate set resulted from (19) restricts the search space for TOMP to only nodes growing out
from the root nodes, instead of the whole set of nodes as the case for OMP in (25). The significant
coefficients inx form connected tree from the roots. The limited search spacehelps TOMP to reduce the
number of comparisons in each iteration and to focus on the coefficients with high probability of being
significant.
Thusd ∈ Z, d ≥ 1 andα ∈ [0, 1] are tuning parameters. Largerd leads to a largercandidate setsso
that we can reach further down significant coefficients ofx, but at the higher computational cost. The
relaxation parameterα allows further restriction of the search space to the finalist set by quick evaluations
of inner products in (20) instead of costly evaluations of the residual norms in (21). Smaller value ofα
leads to bigger finalist sets, which means more accurate selection, but also at the higher computational
cost. Following are some special cases ford andα:
• d = ∞ means the search space contains all nodes.
• α = 0 leads to an exhaustive search of all possible history sets within the candidate set to determine
the one leading to smallest residual.
• d = 1 means only one new node, which is directly connected to the already selected set, is selected
at each iteration. In this case the selection step (21) of TOMP can be achieved via evaluating inner
products with residualrk−1. And thus it is most efficient to setα = 1.
• In generalα = 1 means only one finalist at each iteration. In this case TOMP isalso almost like
OMP except that TOMP selects a whole setH(ik) rather than only a singleik. If the signal satisfies
our assumptionP2, this modification leads to the correct reconstruction since coefficients inH(i)
are significant whenever coefficientsi is significant.
Next, we will present two implementations with regard to thekey selection step (21) of TOMP using
Gram-Schmidt orthogonalization (TOMP-GS) and Least Square (TOMP-LS)
C. TOMP with Gram-Schmidt Implementation
Similar to OMP, the implementation of TOMP in this section relies heavily on the Gram-Schmidt
orthogonalization, therefore we name it the TOMP-GS version. Whenever a new atomai is selected,
i is stored into the selected setΛk. At the same time,ai is orthogonalized with respect to all already
selected atoms and then stored in the setUk, namely theGram-Schmidt selected set. Uk is the set of
atoms, not a set of indexes. Particularly,aΛ1are Gram-Schmidt orthogonalized and put inU1.
DRAFT March 31, 2009
11
TOMP-GS follows equations (18)-(20). To solve (21), for each i ∈ F k, we form{aH(i)∩Ck} called a
subtreecorresponding toi, which is totally inside the search spaceCk. We orthogonalize each vector in
the subtreeagainst theselected setUk−1 and other vectors in{aH(i)∩Ck} to form theorthogonalized
subtree {aH(i)∩Ck} using the Gram-Schmidt algorithm.
We select thesubtreethat gives the largest projection of the current residual
ik = arg maxi∈Fk
‖Pspan {aH(i)∩Ck
}rk−1‖2, (26)
whereaikis the lowest node on the selectedsubtree. This gives the solution of (21).
The selected setΛk and the new residualrk are updated through (22) and (23). Equation (23) is
performed by deducting from the residual its projection onto the selected subtree:
rk = rk−1 − Pspan {aH(ik)∩Ck
}rk−1. (27)
The selected setUk is updated by adding the selected orthogonalized subtree:
Uk = Uk−1 ∪ {aH(ik)∩Ck}. (28)
We terminate TOMP-GS when the stopping rules discussed in Section IV-A are satisfied.
D. TOMP with Least-Square Implementation
To overcome the computational cost of Gram-Schmidt and the storage cost of huge measurement
matrix A, we introduce another implementation of TOMP in this section. This implementation uses an
iterative method called Least Square (LSQR) [28] to solve for least square solutions, so this version is
called TOMP-Least Square (TOMP-LS).
For large signals such as images, measurement matrixA can be implemented via fast algorithm instead
of matrix multiplication as will be discussed below. Suppose that signaly is measured by collecting a
small number of coefficients in a transform domain:
b = My = SFy, (29)
where F is the transform matrix with a fast algorithm (such as fast Fourier transform), andS is a
selection matrix that collects a small number of coefficients in Fy to form b.
Thus the measurement matrix for sparse vectorx is
A = MW = SFW , (30)
which can be implemented fast using FFT and DWT.
March 31, 2009 DRAFT
12
In the above scheme, it is only possible to perform multiplications with the whole matrixA. The
followings are some techniques to perform computations on specific columns ofA indexed by the setΛ.
Suppose we want to compute the inner products between a residual r and some columns ofA:
ci = |〈r,ai〉| where i ∈ Λ. (31)
We multiply matrixA by vectorr to get vectorc = Ar, then extract the required valuesci’s at locations
i ∈ Λ in c as
ci = c[i] where i ∈ Λ, c = Ar. (32)
Suppose we want to multiply columns ofA indexed byΛ with a vectorz of the same size asΛ. We
create a zero vectorz of size equal to the height ofA. We copy coefficients ofz to locationsi ∈ Λ in
z and perform the matrix multiplication
AΛz = Az where z[Λ] = z, z[Λ] = 0, (33)
whereAΛ is the matrix containing all the columns ofA indexed byΛ. Vector z[Λ] contains the entries
of z which are not indexed byΛ.
Now we are ready to describe TOMP-LS. Since we do not have direct access to each single column
of A, we only work with theindex setsΛk, Ck, andF k. The notations in this section follow exactly
those in Section IV-A.
In each iterationk (k = 2, 3, . . .), TOMP-LS follows (19)-(20) to form thefinalist setF k. The selection
step (21) is implemented through two steps: For eachi ∈ F k, we use LSQR to compute
t∗i = arg mint
‖AΛk−1∪H(i)t − b‖2, i ∈ F k. (34)
We then select thesubtreethat gives the smallestl2-norm distance to datab, denoted byik, the lowest-
level node on the subtree
ik = arg mini∈Fk
‖AΛk−1∪H(i)t∗i − b‖2. (35)
The selectedΛk set is updated as in (22). The estimation ofx after iterationk is
x(k) = t∗ik, (36)
where x(k) ∈ R|Λk| contains estimated nonzero coefficients ofx at locationsΛk. Hence the updated
residual is
rk = b − AΛkx(k). (37)
TOMP-LS stops when‖rk‖22 ≤ ε. The preselected valueε is determined based on the noise level.
DRAFT March 31, 2009
13
V. TREE-BASED MAJORIZATION-M INIMIZATION
A. Majorization-Minimization Approach
A Lagrange multiplier can be used to reformulate the inverseproblem with sparse-tree prior into the
following unconstrained optimization problem:
minx
C(x) where C(x) = ‖Ax − b‖22 + µ · ST (x). (38)
In this section we will follow theMajorization-Minimization(MM) approach [15], [16], [17], [29] in
solving (38). The first step is tomajorizeCx with the following (so called surrogate) function:
Q(x|y) = ‖Ax − b‖22 + λ‖x − y‖2
2
−‖A(x − y)‖22 + µ · ST (x). (39)
By choosingλ bigger than the maximum eigenvalue ofAT A we ensure thatQ(x|y) ≥ C(x) for all
x, y with equality if and only ifx = y. The MM approach amounts to a sequence of estimatex(t) by
iteratively minimizing the majorized function:
x(t+1) = arg minx
Q(x|x(t)). (40)
The MM iteration monotonically decreases the cost functionsince:
C(x(t+1)) ≤ Q(x(t+1)|x(t)) ≤ Q(x(t)|x(t)) ≤ C(x(t)). (41)
As a results, the MM iteration is guaranteed to converge to a local optimal solution. We can rewrite
Q(x|y) in (39) as:
Q(x|y) = λ(‖x‖22 − 2(y + λ−1AT (b − Ay))T x)
+µ · ST (x) + (‖b‖22 + λ‖y‖2
2 − ‖Ay‖22).
(42)
Thus by denoting:
z = x(t) + λ−1AT (b − Ax(t)), (43)
then (40) is equivalent to:
x(t+1) = arg minx
(λ‖x − z‖22 + µ · ST (x)). (44)
Equations (43) and (44) make up the two steps in each iteration of a tree-based majorize-minimize
(TMM) algorithm. The advantage of the TMM algorithm is that it does not need to store the matrixA
March 31, 2009 DRAFT
14
but instead it only requires fast algorithm to compute matrix multiplications byA andAT for the first
steo (43). In the next section we will develop an efficient algorithm to solve the second step (44).
Notice that the MM approach allows any other regularizationterm to be used instead ofST (x) in
the optimization problem (38). In compressed sensing,‖x‖0 or ‖x‖0 regularization is typical used [15],
[16], [17], [29] for promoting sparse solution. For example, with ‖x‖1 regularization term in place of
ST (x), the second step (44) becomes the component-wise soft thresholding
x(t+1)n = sign(zn) · max{0, |zn| − µ/λ}. (45)
We refer to iteration with (43) and (45) as MM algorithm for inverse problem with sparse prior,
compared to TMM.
B. Fast Best Search in Trees
Let θ = µ/λ. Our task in solving the second step in TMM (44) is to minimizethe following objective
function for a givenz:
J(x) = θ · ST (x) + ‖x − z‖22. (46)
Recall that the sparse-tree prior termST (x) is defined as the number of nodes in the smallest connected
treeT (x) that contains all nonzero entries ofx:
T (x) =⋃
xi 6=0
H(i). (47)
DenoteT c(x) for set remaining of entries inx, and |T (x)| for the number of nodes inT (x). Since
xi = 0 for i ∈ T c(x), we have:
J(x) = θ · |T (x)| +∑
i∈T (x)
(xi − zi)2
+∑
i∈Tc(x)
(xi − zi)2
≥ θ · |T (x)| +∑
i∈Tc(x)
z2i (48)
with equality if and only ifxi = zi for all i ∈ T (x).
Thus solving (44) amounts to searching for the best treeT ∗ starting from the root node that minimizes
J(T ) = θ · |T | +∑
i∈Tc(x)
z2i . (49)
DRAFT March 31, 2009
15
The solutionx∗ of (44) can be obtained from the minimizerT ∗ of (49) as
x∗i =
zi if i ∈ T ∗,
0 else.(50)
We now develop a fast dynamic programing algorithm that iterates from the root of the tree to the
leaves to search for the minimizer treeT ∗ of (49). This algorithm is similar to the best basis search
algorithm for wavelet packets [30].
For a noden on the index tree ofz let us denoteτ(n) the set of all trees that has root atn including
the empty tree (denoted by∅) and FT(n) the full tree that has root atn and grows all the way to the
bottom. We define
T ∗(n) = arg minT∈τ(n)
(
θ · |T | +∑
i∈FT(n)\T
z2i
)
, (51)
J∗(n) = θ · |T ∗(n)| +∑
i∈FT(n)\T ∗(n)
z2i , (52)
S(n) =∑
i∈FT(n)
z2i . (53)
The following proposition gives a recursive computation ofthese quantities, from bottom up along tree
branches. The final solution of (49) is found at the root of thetree.
Proposition 1 Let C(n) be the set of children ofn and C(n) = ∅ if n is a leaf node. Then
S(n) = z2n +
∑
c∈C(n)
S(c). (54)
If S(n) < θ +∑
c∈C(n) J∗(c) then
T ∗(n) = ∅, (55)
J∗(n) = S(n). (56)
Otherwise
T ∗(n) = n ∪⋃
c∈C(n)
T ∗(c), (57)
J∗(n) = θ∑
c∈C(n)
J∗(c). (58)
Proof: The best treeT ∗(n) minimizes the following cost function
Jn(T ) = θ · |T | +∑
i∈FT(n)\T
z2i
=∑
i∈T
+∑
i∈FT(n)\T
z2i , (59)
March 31, 2009 DRAFT
16
among allT ∈ τ(n). SuchT can be either an empty set or a tree that has root atn. In this second case,
T consists ofn and subtrees with root at children ofn:
T = n ∪⋃
c∈C(n)
T c (60)
for someT c ∈ τ(c), and thus
Jn(T ) = θ +∑
c∈C(n)
Jc(T c). (61)
In this case, the costJn(T ) is minimum if each subtreeT c minimizes its costJc(T c). HenceT ∗(n)
is either an empty set orn ∪⋃
c∈C(n) T ∗c . The best treeT ∗(n) is obtained by comparing the costs of
these two possibilities.
The recursive algorithm described in Proposition 1 visits each node inz once where it takes a constant
number of operations. Thus, the complexity of this algorithm for the second step in each TMM iteration
is O(M) whereM is the length ofx or signaly.
VI. EXPERIMENTAL RESULTS
A. TOMP
Throughout this section, the test signals are piecewise-smooth signals as in Figure 4 (a). An 8-level
Daubechies 4 wavelet transform is applied to the test signals to get sparse-tree signals in the wavelet
domain. The signals are then measured by matrices of i.i.d. Gaussian entries (zero mean and variance 1)
of which all columns are normalized to unit norm.
In the first experiment, we count the number of multiplications from inner products performed by
OMP and TOMP-GS at different values ofα and d. The piecewise smooth signal of length 4096 is
reconstructed from 300 measurements. Figure 3 shows that large α and smalld lead to a small number
of multiplications. This is because the size of the searching space is reduced. Whenα ≥ 0.7 and
l ≤ 3, TOMP-GS requires at most about2.9 × 107 multiplications while OMP requires about9.7 × 107
multiplications. Hence the limited search space helps TOMP-GS to reduce computational costs.
Through the experiments using the above test signal and wavelet transform, we observe that TOMP-GS
gives high reconstruction SNR with low computational cost at α = 0.975 andd = 2. From now on, the
TOMP-GS experiments are carried out using these two values.
A single reconstruction case is in shown Figure 4 using 300 measurements from a length-4096 piecewise
smooth signal. All existing methods OMP, StOMP and BP fail toreconstruct the signal, only giving
17.02 dB, 13.10 dB and 11.41 dB, respectively. TOMP-GS and TOMP-LS provide 25.25 dB and 27.72 dB
DRAFT March 31, 2009
17
0.5 0.6 0.7 0.8 0.9 10
1
2
3
4
5
6
7
8
9
10x 10
7
Relaxing coeff. alphaN
umbe
r of
mul
tiplic
atio
ns
l = 1l = 2l = 3
OMP: 9.71x107
multiplications
Fig. 3. OMP-GS compared to OMP in the number of multiplications from inner products when varyingα andd.
reconstruction, which improves more 10 dB over existing methods. As shown in Figure 4 (k-l), TOMP-
GS and TOMP-LS can preserve the tree structure of the original signal while existing methods OMP,
StOMP, and BP cannot. Moreover, TOMP-GS requires only about6% of the number of multiplications
required by OMP.
We next reconstruct the piecewise smooth signal of length 4096 from different numbers of measure-
ments using OMP, StOMP, BP, TMP [14], TOMP-GS, and TOMP-LS. For each number of measurements,
we average the reconstruction SNR over 10 trials with different randomly generated measurement matrices
M . We focus in the region of small number of measurements and compare the SNRs resulting from
TOMP and other methods. Figure 5 shows the comparisons whichagain demonstrate significant gains
of the proposed TOMP methods.
The experiments described above demonstrate that TOMP can provide better and faster reconstruction
than methods that only rely on the sparse representations. Like OMP, TOMP-GS incurs the Gram-Schmidt
orthogonalization cost and the storage cost of selected columns ofA. Especially when signals are of
large size, such as images, the measurement matrixA may not be stored in computers. For example,
in reconstructing an image of size256 × 256 from the number of measurements only 10% of its size,
we need a6554 × 65536 matrix A. This costs about 3 GB of memory to storeA in double precision.
Therefore, it is impractical to extract single columns ofA for comparison and orthogonalization and to
store all selected columns. TOMP-GS can be used to reconstruct signals of a few thousand entries in
length. For larger signals, we have to resort to methods likeTOMP-LS and TMM that use fast algorithms
for computing matrix multiplication withA.
March 31, 2009 DRAFT
18
500 1000 1500 2000 2500 3000 3500 4000
−20
−10
0
10
20
30
40
50
60
(a) Original signal
500 1000 1500 2000 2500 3000 3500 4000
−20
−10
0
10
20
30
40
50
60
(b) OMP (17.02 dB)
500 1000 1500 2000 2500 3000 3500 4000
−20
−10
0
10
20
30
40
50
60
(c) StOMP (13.10 dB)
500 1000 1500 2000 2500 3000 3500 4000
−20
−10
0
10
20
30
40
50
60
(d) BP (11.41 dB)
(e) Original tree (f) OMP tree (g) StOMP tree (h) BP tree
500 1000 1500 2000 2500 3000 3500 4000
−20
−10
0
10
20
30
40
50
60
(i) TOMP-GS (25.25 dB)
500 1000 1500 2000 2500 3000 3500 4000
−20
−10
0
10
20
30
40
50
60
(j) TOMP-LS (27.72 dB)
(k) TOMP-GS tree (l) TOMP-LS tree
Fig. 4. An example piecewise-smooth signal of length 4096 and its reconstructions from 300 linear measurements using OMP,
StOMP, BP, TOMP-GS, and TOMP-LS, together with tree structures showing locations of significant coefficients recoveredby
those methods.
When the Bumps signal of length 4096 is used as test signal, from 600 samples, TOMP-LS returns a
reconstruction SNR of 48.26 dB, far better than StOMP (16.21dB) and BP (11.61 dB).
The size of theselected setin TOMP-LS is not restricted by computation and storage limitations. In
addition, TOMP-LS recomputes the values of all significant entries after each iteration. Whenxi andxj
are nonzero, the corresponding columnsai andaj can be parallel. TOMP-GS can add two of them into
the selected set. However, one of them will be zero due to Gram-Schmidt. TOMP-LS overcomes this
DRAFT March 31, 2009
19
0 200 400 600 800 1000 12000
10
20
30
40
50
60
Number of measurementsS
NR
(dB
)
TOMP−LS
TOMP−GS
StOMP
OMP
TMP
BP
Fig. 5. Comparisons between different reconstruction methods of a piecewise-smooth signal using BP, OMP, StOMP, TMP
[14], TOMP-GS, and TOMP-LS.
500 1000 1500 2000 2500 3000 3500 4000
1
2
3
4
5
6
Original Bumps signal
(a) Original signal
500 1000 1500 2000 2500 3000 3500 4000
1
2
3
4
5
6
BP reconstruction from 600 samples; SNR = 11.61 dB
yyhat
(b) BP (11.61 dB)
500 1000 1500 2000 2500 3000 3500 4000
1
2
3
4
5
6
StOMP reconstruction from 600 samples, SNR = 16.2071
yyhat
(c) StOMP (16.21 dB)
500 1000 1500 2000 2500 3000 3500 4000
1
2
3
4
5
6
TOMP−LSQR reconstruction from 600 samples, SNR = 48.2577
yyhat
(d) TOMP-LS (48.26 dB)
The original tree
−4.5
−4
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
(e) Original tree
BP reconstructed tree
−4.5
−4
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
(f) BP tree
StOMP reconstructed tree
−4.5
−4
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
(g) StOMP tree
TOMP−LSQR reconstructed tree
−4.5
−4
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
(h) TOMP-LS tree
Fig. 6. The Bumps signal of length 4096 and its reconstructions from 600 linear measurements using BP, StOMP, and TOMP-LS,
together with tree structures showing locations of significant coefficients recovered by those methods.
limitation by LSQR recomputation at each iteration.
When we use the test signals Sinc and Doppler, the tree structures are not well connected as in Figures
7 and 8. However, TOMP-LS still outperforms BP and StOMP. Although TOMP-LS selects insignificant
entries to guarantee the tree structure, their values are set to zero by means of projection. Therefore, the
reconstructed signal is still close to the original.
March 31, 2009 DRAFT
20
100 200 300 400 500 600 700 800 900 1000
−1
−0.5
0
0.5
1
(a) Original signal
100 200 300 400 500 600 700 800 900 1000
−1
−0.5
0
0.5
1
SNR = 19.07 dB
sshat
(b) BP (19.07 dB)
100 200 300 400 500 600 700 800 900 1000
−1
−0.5
0
0.5
1
SNR = 20.2138
sshat
(c) StOMP (20.21 dB)
100 200 300 400 500 600 700 800 900 1000
−1
−0.5
0
0.5
1
SNR = 37.2524
sshat
(d) TOMP-LS (37.25 dB)
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
(e) Original tree
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
(f) BP tree
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
(g) StOMP tree
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
(h) TOMP-LS tree
Fig. 7. The Sinc signal of length 1020 and its reconstructions from 400 linear measurements using BP, StOMP, and TOMP-LS,
together with tree structures showing locations of significant coefficients recovered by those methods.
500 1000 1500 2000 2500 3000 3500 4000
−0.6
−0.4
−0.2
0
0.2
0.4
(a) Original signal
500 1000 1500 2000 2500 3000 3500 4000
−0.6
−0.4
−0.2
0
0.2
0.4
SNR = 32.89 dB
s
shat
(b) BP (32.89 dB)
500 1000 1500 2000 2500 3000 3500 4000
−0.6
−0.4
−0.2
0
0.2
0.4
SNR = 45.8881
s
shat
(c) StOMP (45.89 dB)
500 1000 1500 2000 2500 3000 3500 4000
−0.6
−0.4
−0.2
0
0.2
0.4
SNR = 58.0531
(d) TOMP-LS (58.05 dB)
−2
−1.8
−1.6
−1.4
−1.2
−1
−0.8
−0.6
−0.4
−0.2
(e) Original tree
−2
−1.8
−1.6
−1.4
−1.2
−1
−0.8
−0.6
−0.4
−0.2
(f) BP tree
−2
−1.8
−1.6
−1.4
−1.2
−1
−0.8
−0.6
−0.4
−0.2
(g) StOMP tree
−2
−1.8
−1.6
−1.4
−1.2
−1
−0.8
−0.6
−0.4
−0.2
(h) TOMP-LS tree
Fig. 8. The Doppler signal of length 4096 and its reconstructions from 700 linear measurements using BP, StOMP, and
TOMP-LS, together with tree structures showing locations of significant coefficients recovered by those methods.
B. TMM
We use TMM to reconstruct a 1-D piecewise-smooth signal of length 4096 from 300 measurements.
The measurement matrixA is of i.i.d. Gaussian entries (zero mean and variance 1). It can be seen from
DRAFT March 31, 2009
21
500 1000 1500 2000 2500 3000 3500 4000
−20
−10
0
10
20
30
40
50
60
(a) Original
500 1000 1500 2000 2500 3000 3500 4000
−20
−10
0
10
20
30
40
50
60
(b) MM (9.89 dB)
500 1000 1500 2000 2500 3000 3500 4000
−20
−10
0
10
20
30
40
50
60
(c) TMM (20.03 dB)
(d) Original (e) MM (f) TMM
Fig. 9. An example piecewise-smooth signal of length 4096 and its reconstructions from 300 linear measurements using MM
and TMM, together with tree structures showing locations ofsignificant coefficients and those recovered by those methods.
The original box image
10 20 30 40 50 60
10
20
30
40
50
60
−2 0 2 4 6 8 10 12
(a) Original
MM Resconstructed img, Lambda = 4136.96 Muy = 2068.48
10 20 30 40 50 60
10
20
30
40
50
60
−2 0 2 4 6 8 10 12
(b) MM
Modified MM Resconstructed img, Lambda = 4136.96 Muy = 2068.48
10 20 30 40 50 60
10
20
30
40
50
60
−2 0 2 4 6 8 10 12
(c) TMM
Fig. 10. An example simple box image and its reconstructionsfrom 12 radial scan lines in the frequency domain using MM
and TMM.
Figure 9(e),(f) that TMM using tree-thresholding can recover the tree structure while MM suffers from
noise due to the wrong allocation of significant coefficients. Therefore TMM provides a reconstruction
SNR of 20.03 dB compared to 9.89 dB from MM.
We then use TMM to reconstruct a simple box image and compare the result to that from MM. Using
the scheme in (30), we measure the box image in Figure 10(a) bycollecting coefficients along 12 radial
lines in its frequency domain. We can tell from the results that TMM with tree-thresholding also gives
a better reconstruction than MM in this case.
In the above experiment, the image is small and simple. The Haar wavelet, the simplest type of
wavelet transform, is applied upon the box image. For largersize and more complicated images, we can
March 31, 2009 DRAFT
22
employ different types of wavelet transforms to get sparse representations which have well-organized tree
structure. In addition, we can also use more complicated thresholding criteria such as different thresholds
for each wavelet level.
VII. C ONCLUDING REMARKS
Current compressed sensing methods employ the sparse property of unknown signals as key prior for
reconstruction. In this paper, we promote the tree structure as an additional important prior to get better
and faster reconstruction. We formulate thesparse-tree inverse problemwith sparse-treeprior.
We introduce two reconstruction algorithms based on the sparse-tree prior, namely the TOMP algorithm
and the TMM algorithm. Experimental results show that the tree-based algorithms provide significant gains
compared to the existing ones in terms of reconstructed signal quality. In particular, TOMP provides better
reconstruction than OMP, StOMP, and BP; while TMM exceeds MM.
The TOMP-GS version bases on Gram-Schmidt can serve as basictool in later mathematic analysis.
To overcome the limitation of signal size and storage cost, the TOMP-LS version is introduced. However,
running time, even much less than existing algorithms, is still an issue of the TOMP algorithm. TMM
appears to be the most promising algorithm in terms of signalsizes and running time.
REFERENCES
[1] Y. Bresler, M. Gastpar, and R. Venkataramani, “Image compression on-the-fly by universal sampling in Fourier imaging
systems,” inIEEE Information Theory Workop on Detection, Estimation, Classification, and Imaging, Santa-Fe, USA,
1999.
[2] M. Vetterli, P. Marziliano, and T. Blu, “Sampling signals with finite rate of innovation,”IEEE Trans. Signal Proc., vol. 50,
no. 6, pp. 1417–1428, Jun. 2002.
[3] E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete
frequency information,”IEEE Trans. Info. Theory, vol. 52, pp. 489–509, Feb. 2006.
[4] D. L. Donoho, “Compressed sensing,”IEEE Trans. Info. Theory, vol. 52, pp. 1289–1306, Apr. 2006.
[5] E. J. Candes and T. Tao, “Near optimal signal recovery from random projections: Universal encoding strategies?”IEEE
Trans. Info. Theory, vol. 52, pp. 5406–5425, Dec. 2006.
[6] J. A. Tropp and A. C. Gilbert, “Signal recovery from partial information via orthogonal matching pursuit,”IEEE Trans.
Info. Theory, vol. 53, pp. 4655–4666, Dec. 2007.
[7] J. M. Shapiro, “Embedded image coding using zerotrees ofwavelet coefficients,”IEEE Transactions on Signal Processing,
Special Issue on Wavelets and Signal Processing, vol. 41, no. 12, pp. 3445–3462, December 1993.
[8] A. Said and W. A. Pearlman, “A new, fast, and efficient image codec based on set partitioning in hierarchical trees,”IEEE
Trans. Circuits and Systems for Video Technology, vol. 6, no. 3, pp. 243–250, Jun. 1996.
[9] M. Crouse, R. D. Nowak, and R. G. Baraniuk, “Wavelet-based signal processing using hidden Markov models,”IEEE
Trans. Signal Proc. (Special Issue on Wavelets and Filterbanks), vol. 46, pp. 886–902, Apr. 1998.
DRAFT March 31, 2009
23
[10] M. J. Wainwright, E. P. Simoncelli, and A. S. Willsky, “Random cascades on wavelet trees and their use in modeling and
analyzing natural images,”Journal of Appl. and Comput. Harmonic Analysis, vol. 11, pp. 89–123, Jul. 2001.
[11] A. Cohen, W. Dahmen, I. Daubechies, and R. DeVore, “Treeapproximation and optimal encoding,”Journal of Appl. and
Comput. Harmonic Analysis, vol. 11, pp. 192–226, Dec. 2001.
[12] C. La and M. N. Do, “Signal reconstruction using sparse tree representation,” inProc. of SPIE Conf. on Wavelet Applications
in Signal and Image Processing, San Diego, Aug. 2005.
[13] ——, “Tree-based orthogonal matching pursuit algorithm for signal reconstruction,” inProc. IEEE Int. Conf. on Image
Proc., Atlanta, USA, Oct. 2006.
[14] M. F. Duarte, M. B. Wakin, and R. G. Baraniuk, “Fast reconstruction of piecewise smooth signals from random projections,”
in Proc. of Workshop on Signal Processing with Adaptative Sparse Structured Representations, Rennes, France, Nov. 2005.
[15] I. Daubechies, M. D. Friese, and C. D. Mol, “An iterativethresholding algorithm for linear inverse problems with a sparsity
constraint,”Comm. Pure and Applied Math., vol. 57, pp. 3601–3608, 2004.
[16] M. Figueiredo, J. Bioucas-Dias, and R. Nowak, “Majorization-minimization algorithms for wavelet-based image restora-
tion,” IEEE Trans. on Image Processing, vol. 16, no. 12, pp. 2980–2991, 2007.
[17] M. Elad, B. Matalon, and M. Zibulevsky, “Coordinate andsubspace optimization methods for linear least squares with
non-quadratic regularization,”Journal of Appl. and Comput. Harmonic Analysis, pp. 346–367, 2007.
[18] M. N. Do and C. N. H. La, “Tree-based majorize-minimize algorithm for compressed sensing with sparse-tree prior,” in
Proc. Computational Advances in Multi-Sensor Adaptive Processing, U.S. Virgin Islands, 2007.
[19] R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, “Model-based compressive sensing,” Preprint, 2008.
[20] G. Davis, S. Mallat, and M. Avellaneda, “Greedy adaptive approximation,”J. Constr. Approx., vol. 13, pp. 57–98, 1997.
[21] D. L. Donoho, Y. Tsaig, I. Drori, and J.-L. Starck, “Sparse solution of underdetermined linear equations by stagewise
orthogonal matching pursuit,” vol. 8, Preprint, 2007.
[22] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Preprint,2008.
[23] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensing: Closing the gap between performance and
complexity,” Preprint, 2008.
[24] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,”SIAM J. Sci. Comput., vol. 20,
no. 1, pp. 33–61, 1999.
[25] S. Mallat,A Wavelet Tour of Signal Processing, 2nd ed. Academic Press, 1999.
[26] P. L. Dragotti and M. Vetterli, “Wavelet footprints: Theory, algorithms and applications,”IEEE Trans. Signal Proc., vol. 51,
pp. 1306–1323, May 2003.
[27] A. Barron, J. Rissanen, and B. Yu, “The minimum description length principle in coding and modeling,”IEEE Trans. Info.
Theory, vol. 44, no. 6, pp. 2743–2760, Oct. 1998.
[28] C. C. Paige and M. A. Saunders, “LSQR: An algorithm for sparse linear equations and sparse least squares,”ACM Trans.
Math. Soft., vol. 8, pp. 43–71, 1982.
[29] T. Blumensath and M. E. Davies, “Iterative thresholding for sparse approximations,”Journal of Fourier Analysis and
Applications, vol. 14, pp. 629–654, 2008.
[30] R. R. Coifman and M. V. Wickerhauser, “Entropy-based algorithms for best basis selection,”IEEE Transactions on
Information Theory, Special Issue on Wavelet Transforms and Multiresolution Signal Analysis, vol. 38, no. 2, pp. 713–718,
March 1992.
March 31, 2009 DRAFT