Download pdf - 1 Tree-based Algorithms for Compressed Sensing with …minhdo/publications/sparse_tree.pdfIn [12], [13] we primitively presented the Tree-based Orthogonal Matching Pursuit (TOMP) algorithm

1

Tree-based Algorithms for Compressed Sensing

with Sparse-Tree PriorChinh N. H. La and Minh N. Do

Abstract

Recent studies have shown that sparse representation can beused effectively as a prior in linear

inverse problems. However, in many multiscale bases (e.g. wavelets), signals of interest (e.g. piecewise-

smooth signals) not only have few significant coefficients, but also those significant coefficients are

well-organized in trees. We propose to exploit this prior, namedsparse-tree, for linear inverse problems

with limited numbers of measurements. Toward this end, we present two efficient and effective algorithms

named Tree-based Orthogonal Matching Pursuit (TOMP) and Tree-based Majorization-Minimization

(TMM). Numerical results show that tree-based algorithms provide significantly better reconstruction

quality compared to methods relying only on the sparse prior.

Index Terms

compressed sensing, linear inverse problems, sparse representations, sparse-tree prior, tree struc-

tures, wavelets, greedy algorithms, tree-based orthogonal matching pursuit, tree-based majorization-

minimization.

I. INTRODUCTION

A recent series of remarkable works have shown that thesparse representationof an unknown signal

in a certain basis can be used effectively as prior knowledgefor signal reconstruction from a limited

number of measurements [1], [2], [3], [4], [5], [6]. An unknown signal of lengthN is said to have sparse

C. N. H. La was with the Department of Electrical and ComputerEngineering, University of Illinois at Urbana-Champaign,

Urbana IL 61801. He is now with Intel. (email: [email protected]).

M. N. Do is with the Department of Electrical and Computer Engineering, the Coordinated Science Laboratory, and the

Beckman Institute, University of Illinois at Urbana-Champaign, Urbana IL 61801 (email: [email protected]).

This work was supported by the US National Science Foundation under Grant CCF 06-35234 and a Vietnam Education

Foundation Fellowship.

March 31, 2009 DRAFT

2

representation if it can be well approximated by onlyK significant coefficients whereK ≪ N in a

fixed basis. Significant coefficients are defined as those withmagnitude greater than a certain threshold.

However, the locations of theseK significant coefficients are unknown and depend on the signal. It has

been shown in [3], [4], [5], [6] that under certain conditions, with high probability, the unknown signal

can be reconstructed accurately using onlyO(K log N) nonadaptive linear measurements. These new

results could have great impact in a wide range of inverse problems from remote sensing to medical

imaging where, due to physical constraints, only a limited number of measurements can be acquired

from the unknown object.

Figure 1 (a) and (b) demonstrate the power of the sparse representations via accurate reconstruction

of a lengthN = 256 signal by keeping onlyK = 32 most significant wavelet coefficients.

50 100 150 200 250−30−20−10

010203040

Original (M = 256)

(a)

50 100 150 200 250−30−20−10

010203040

Reconstructed from K = 32 coefficients

(b)

21

22

23

24

25

Wavelet coefficients

(c)

Fig. 1. Example of sparse representation: (a) original signal with N = 256 samples; (b) wavelet approximation usingK = 32

coefficients; (c) wavelet coefficients.

However, in many multiscale bases (e.g., wavelets), objects of interest (e.g., piecewise-smooth signals)

have significant coefficients that arenot only few in number, but also well-organized in trees. Figure 1 (c)

shows an example of these cases. In fact, the embedded tree structures for significant wavelet coeffi-

cients have been used successfully in compression [7], [8],modeling [9], [10], and approximation [11].

Intuitively, a general sparse representation withK coefficients can be described with2K numbers:K

for the values of the significant coefficients and anotherK for the indexing of these coefficients. If the

K significant coefficients are known to be organized in trees, then the indexing cost and hence the total

DRAFT March 31, 2009

3

description of the unknown signal can be significantly reduced.

We propose to exploit thesparse-tree representationas additional prior information for signal recon-

struction with a limited number of measurements. Exploiting this embedded tree structure in addition

to the sparse prior in inverse problems would potentially lead to: (1) better reconstructed signals, (2)

reconstruction using fewer measurements, and (3) faster reconstruction algorithms.

In [12], [13] we primitively presented the Tree-based Orthogonal Matching Pursuit (TOMP) algorithm

for signal reconstructions using sparse-tree prior. The idea of using sparse-tree representation for signal

reconstruction was also independently developed in [14] with the main focus on obtaining fast recon-

struction algorithms. Our focus is to exploit the sparse-tree representation to obtain better reconstruction

compared to existing methods like Orthogonal Matching Pursuit (OMP) [6], Basis Pursuit (BP) [4], [5]

and iterated shrinkage or Majorization-Minimization (MM)[15], [16], [17] that only use the sparse prior.

In [18] we extended the MM algorithm to exploit sparse-tree prior leading to the Tree-based Majorization-

Minimization (TMM) algorithm. A theoretical analysis of model-based compressive sensing including

sparse-tree prior was recently developed in [19].

The outline of the paper is as follows. We review the existingsparse inverse problem and compressed

sensing reconstruction algorithms in Section II. We formulate thesparse-tree inverse problemin Section

III. The TOMP algorithm is presented in Section IV and the TMMalgorithm is presented in Section V.

Numerical experiments in Section VI show the superior performance of tree-based modified algorithms

over the original ones. We conclude the paper in Section VII.

II. BACKGROUND

A. Compressed Sensing and Sparse Inverse Problem

For an unknown signaly of length N , suppose that we can only acquire a limited numberM of

non-adaptive linear measurements(M ≪ N ):

My = b, (1)

whereM is a fixedM ×N measurement matrix, andb is a length-M vector that contains the measured

data. The inverse problem is to reconstruct signaly from b.

Suppose thaty has asparse expansionvia a fixedN × N transform matrixW as

y = Wx, (2)


4

where at mostK out of N entries (K ≪ N ) of x is non-zero or significant. LetA = MW which is

also a knownM × N matrix, then the inverse problem (1) becomes

Ax = b, (3)

and substitutingx to (2) will give y.

SinceM ≪ N , both (1) and (3) are underdetermined systems. Thus in orderto solve this inverse

problem we need to exploit sparse prior ofx. Intuitively, we can recoverx from a fewer number of

measurements since most of the coefficients are insignificant and can be discarded. The sparsity ofx is

measured by its number of nonzero coefficients:

S(x) = ‖x‖0 = size{i : xi 6= 0}. (4)

To solve (3), current methods use the sparsity prior and aim to search for thesparsest solution:

minx

S(x) subject to Ax = b. (5)

We refer to (5) as thesparse inverse problem.

B. Existing Reconstruction Algorithms

Since (5) is an NP-hard problem [20], one approach is to use agreedy searchmethod such as

Orthogonal Matching Pursuit(OMP) [20], [6]. In the OMP method, the measurement matrixA is

considered as a dictionary with columnsai’s of A as atoms. Through out the paper, we assume thatai are

normalized such that‖ai‖2 = 1. Each OMP iteration searches the dictionary for an atom corresponding

to a significant coefficient inx and estimates the value of this coefficient through the mean of orthogonal

projections. Specifically, starting withr0 = b, at thek-th iteration the OMP algorithm selects thek-th

atom as

ik = arg max1≤i≤N

|〈rk−1,ai〉|, (6)

and updates the residual as

rk = b − Pspan{ai1,ai2 ,...,aik}

b. (7)

wherePS denotes the orthogonal projection onto a subspaceS.

There are several improved versions of OMP, including StOMP[21], CoSaMP [22], and Subspace

Pursuit [23], where each iteration selects more than one atom and could include a backtracking or

pruning step.


5

Alternative to OMP, theBasis Pursuitor l1-minimization approach [24], [3], [4], [5] relaxes thel0-norm

in the problem (5) by the convexl1-norm; that is it solves

xBP = arg minx

‖x‖1 subject to Ax = b. (8)

The l1-minimization problem (8) can be solved by linear programing or interior-point methods.

Finally, there is a class of algorithms callediterated shrinkageor Majorization-Minimization[15], [16],

[17] that solve the equivalent regularization problem

x = arg minx

‖Ax − b‖22 + µ‖x‖1, (9)

through iterations in which each consists of two steps:

z = x(t) + λ−1AT (b − Ax(t)), (10)

x(t+1)n = sign(zn) · max{0, |zn| − µ/λ}, (11)

whereλ is the maximum eigenvalue ofATA. The first step is a gradient descent for solvingAx = b,

while the second step is a componentwise soft thresholding for enforcing sparsity.

III. SPARSE-TREE INVERSE PROBLEM

A. The Sparse-Tree Characteristic of Signals

If the transform in (2) is anL-level 1-D wavelet transform then the entries ofx consist of{

{x(s)L,p}1≤p≤M/2L , {x

(w)l,p }1≤l≤L, 1≤p≤M/2l

}

,

where x(s)L,p are the scaling coefficients andx(w)

l,p are the wavelet coefficients. These coefficients are

arranged in atree structure[25]. In the notationx(w)l,p , l is the level andp is the index in that level. Hence

the vectorx resulting from the 1-D wavelet transform can be arranged a binary tree as in Figure 2. A

wavelet coefficientx(w)l,p has two childrenx(w)

l−1,2p−1 andx(w)l−1,2p. Thus, the entries inx can be specified

either in the vector formxi or in the tree-structured formxl,p. The binary tree structure of 1-D wavelet

transform can be extended to multi-dimensional wavelet transforms such as quad-tree for images in 2-D.

Examining the wavelet representationx of a piecewise-smooth signal, as in Figure 1(c), we identify

the following two distinguish properties that together form thespare-treeprior:

P1 Vectorx has sparse structure; i.e. only few entries inx are nonzero or significant.

P2 Those nonzero or significant entries ofx are well connected in a tree structure (see below).

P1 is the common used prior in compressed sensing as discussed in Section II-A. However,P2 is an

important additional prior that has not been considered by current inverse problems (5), (8), (9). Property


6

White node: insignificant coefficient

Level 3

Level 2

Level 1

Level 0

Wavelet

Scalingcoeff.

coeff.

Black node: significant coefficient

Fig. 2. The tree structures of an example 1-D wavelet coefficient set: Black nodes represent significant coefficients; white

nodes represent insignificant coefficients.

P2 holds because significant wavelet coefficients inx correspond to discontinuities in the original signal

y [25], and each discontinuity generates a set of non-zero wavelet coefficients in a “cone of influence”

referred to as wavelet footprint [26]. In particular,if a coefficient is nonzero or significant then its ancestors

are likely nonzero or significant. As mentioned before, a coefficient is considered significant when its

magnitude is bigger than a certain thresholdǫ0. If the signal is strictly sparse we can letǫ0 = 0 meaning

nonzero entries are significant. PropertyP2 implies that if eitherx(w)l−1,2p−1 or x

(w)l−1,2p is significant, then

xwl,p is likely significant. Therefore, the significant coefficients of x themselves form aconnected treeas

illustrated in Figure 2.

B. The Sparse-Tree Inverse Problem

We denoteH(i) the index set of thehistory nodes of the nodei, that is all nodes in the tree branch

from the nodei up to the root level. IfI is an index set thenxI denotes the set of entries inx with

indexes fromI:

xI = {xi : xi ∈ x, i ∈ I}. (12)

Let T (x) be the smallest connected tree that contains indexes of all the nonzero coefficients ofx:

T (x) =⋃

xi 6=0

H(i). (13)

Like (4) the size of treeT (x) is measured by

ST (x) = size(T (x)). (14)

Compare with the sparse measure (4), we see that if nonzero entries ofx are connected in a tree then

ST (x) = S(x). Otherwise,ST (x) ≫ S(x).


7

While current inverse problems rely only on the spare priorP1, we realize that the tree characteristic

P2 offers an additional important prior. This propertyP2 lead us to replace the termS(x) in (5) by

ST (x) and form the following problem:

minx

ST (x) subject to Ax = b. (15)

The objective of (15) is to find the smallest connected tree containing all the nonzero entries ofx that

solvesAx = b. We refer to this problem as thesparse-tree inverse problemin connection to thesparse

inverse problemin (5).

Problem (15) is applicable to compressible signals that have well connected significant entries in a

certain domain. The most typical and representative cases are piecewise-smooth signals in the wavelet

domain. Problem (15) can also be applied to signals that havestructures approximate connected trees as

will be seen later in numerical experiments.

C. Justify Sparse-Tree Prior using Coding Argument

The termST (x) represents the indexing cost ofx using sparse-tree structure. A major component in

the complexity (for example, in coding) of a sparse vectorx is the indexing cost for the locations of its

nonzero entries. In fact, the main goal of current reconstruction algorithms in compressed sensing is to

recover the locations of these nonzero entries.

With the sparse prior alone, for a lengthN vectorx that hasS(x) nonzero entries, we need log2N

bits to specify the location of each nonzero entries. Hence,the total indexing cost under the sparse prior

is

IndexingCostsparse(x) = S(x) · log2 N. (16)

With the sparse-tree prior, like in practical wavelet-based coding schemes, we can efficiently specify

nonzero entries along a tree, in which each nonzero entry canbe accessed through its ancestors [7]. For

example, for the 1-D wavelet transform as can be seen in Figure 2, we need only two bits for each

nonzero coefficient to code four possibilities that this coefficient has{no child, one child on the left, one

child on the right, or two children} that are nonzero. With tree-based indexing, to specify the locations

of all non-zero coefficients ofx we need to expand to the smallest connected treeT (x) as defined in

(13). Thus, the total indexing cost under the sparse-tree prior is

IndexingCostsparse-tree= ST (x) · 2. (17)

The indexing cost with sparse-tree prior depends onST (x) and the role ofT (x). If the nonzero entries

of x are well-connected on a tree thenST (x) = S(x), and hence the indexing cost with sparse-tree prior


8

(17) is significantly smaller than the indexing cost with sparse prior (16). This significant reduction in

indexing cost motivates our formulation of the spare-tree inverse problem (15). It can also be motivated

from the minimum description length principle [27].

IV. T REE-BASED ORTHOGONAL MATCHING PURSUIT

A. Algorithm Description

All methods discussed in Section II-B only exploit the sparsity propertyP1 of a signalx in the problem

Ax = b. We now present an improved version of OMP, namedTree-Based Orthogonal Matching Pursuit

(TOMP), that additionally exploits the tree propertyP2 of x.

The inputs of TOMP are anM × N measurement matrixA where M ≪ N , a data vectorb of

lengthM and two tuning coefficients: therelaxing coefficientα (α ∈ [0, 1]) and thedownward extending

coefficientd (d ∈ Z, d ≥ 1) as will be explained later. TOMP returns a reconstruction vectorx of length

M which is the solution of (15) and has sparse tree structure.

The key idea of TOMP is to greedily select from a limited search space ofA sets of columns

corresponding to branches of the treeT (x) defined in (13). LetΛk be the accumulatedselected set

which stores indexes of the selected columnsai in A from the beginning until iteration stepk. The

initial selected setΛ1 consists of indexes of entries which are known to be significant and at the roots of

the tree. For example, in the case of wavelet transformation, all scaling coefficients are significant and

Λ1 = {indexes of all scaling coefficients}.

And the initial residual is

r1 = b − Pspan{ai : i∈Λ1}b. (18)

For each iteration stepk (k = 2, 3, . . .), the TOMP algorithm first forms acandidate setCk that is

restricted tod-level descendants of the already selected nodes

Ck =⋃

i∈Λk−1

Dd(i), (19)

whereDd(i) denotes all descendants ofi within d levels on the tree. Henced is named thedownward

extending coefficient.

The TOMP algorithm then evaluates the inner products of the residualrk−1 with atomsai in Ck and

selects atoms with largest inner products as thefinalist setF k:

F k = {i : i ∈ Ck s.t. |〈rk−1,ai〉| ≥ α maxj∈Ck

|〈rk−1,aj〉|}, (20)


9

whereα is a given relaxing coefficient.

The next selected entries are determinedbased on the whole historyH(i)

ik = arg mini∈Fk

‖b − Pspan{aj : j∈Λk−1∪H(i)}b‖2. (21)

Finally, the selected set and residual are updated as

Λk = Λk−1 ∪ H(ik), (22)

rk = b − Pspan{ai : i∈Λk}b. (23)

Steps (19) to (23) are repeated for each iteration until we encounter a stopping criterion. One possible

stopping criterion is when the number of selected items inΛk reaches a predefined portion (e.g., half) of

the number of rows inA. Intuitively, we cannot expect to recover a signal with moredegrees of freedom

than the number of measurements. Another stopping rule is‖rk‖22 ≤ ε. Whenx is not exactly sparse,

ε can be an optionally selected limit with very small value to eliminate the insignificant coefficients,

as a means of lossy compression. In the noisy case, the value of ε is based on the noise level in the

measurement data. For example, whenb is affected by an additive white Gaussian noise of varianceσ2,

ε can be chosen asNσ2, the noise energy, whereN is the length ofb.

After the last TOMP iteration, nonzero coefficients of the estimated signalx are indexed byΛk and

solved by∑

λ∈Λk

xλaλ = Pspan{ai : i∈Λk}b. (24)

B. Comparison between TOMP and OMP

Since TOMP selects entries by expanding a selection tree, the final selected set is a connected sparse-

tree. Moreover, only tree branches that lead to the smallestresidual via orthogonal projection are selected

at each iteration.

Equations (19), (21), and (22) mark the major differences between OMP and our proposed TOMP. A

slight modification of the OMP iteration (6)-(7) can be expressed as

ik = arg min1≤i≤N

‖b − Pspan{aj : j∈{i1,i2,...,ik−1}∪{i}}b‖2. (25)

Thus, comparing (25) with (21) we see that at each iteration OMP expands the selected set by one

entry that minimizes the resulting residual. Whereas TOMP expands by a group (a tree branch) of nodes.

In the formulation (15) if a nodei is in the set of significant coefficients then so are all nodes in H(i).

Hence, TOMP is more robust to noise and the limitedness of thenumber of measurements. In addition,


10

in each iteration TOMP adds a group of entries into the selected setΛk, thus TOMPS requires fewer

iterations than OMP.

The candidate set resulted from (19) restricts the search space for TOMP to only nodes growing out

from the root nodes, instead of the whole set of nodes as the case for OMP in (25). The significant

coefficients inx form connected tree from the roots. The limited search spacehelps TOMP to reduce the

number of comparisons in each iteration and to focus on the coefficients with high probability of being

significant.

Thusd ∈ Z, d ≥ 1 andα ∈ [0, 1] are tuning parameters. Largerd leads to a largercandidate setsso

that we can reach further down significant coefficients ofx, but at the higher computational cost. The

relaxation parameterα allows further restriction of the search space to the finalist set by quick evaluations

of inner products in (20) instead of costly evaluations of the residual norms in (21). Smaller value ofα

leads to bigger finalist sets, which means more accurate selection, but also at the higher computational

cost. Following are some special cases ford andα:

• d = ∞ means the search space contains all nodes.

• α = 0 leads to an exhaustive search of all possible history sets within the candidate set to determine

the one leading to smallest residual.

• d = 1 means only one new node, which is directly connected to the already selected set, is selected

at each iteration. In this case the selection step (21) of TOMP can be achieved via evaluating inner

products with residualrk−1. And thus it is most efficient to setα = 1.

• In generalα = 1 means only one finalist at each iteration. In this case TOMP isalso almost like

OMP except that TOMP selects a whole setH(ik) rather than only a singleik. If the signal satisfies

our assumptionP2, this modification leads to the correct reconstruction since coefficients inH(i)

are significant whenever coefficientsi is significant.

Next, we will present two implementations with regard to thekey selection step (21) of TOMP using

Gram-Schmidt orthogonalization (TOMP-GS) and Least Square (TOMP-LS)

C. TOMP with Gram-Schmidt Implementation

Similar to OMP, the implementation of TOMP in this section relies heavily on the Gram-Schmidt

orthogonalization, therefore we name it the TOMP-GS version. Whenever a new atomai is selected,

i is stored into the selected setΛk. At the same time,ai is orthogonalized with respect to all already

selected atoms and then stored in the setUk, namely theGram-Schmidt selected set. Uk is the set of

atoms, not a set of indexes. Particularly,aΛ1are Gram-Schmidt orthogonalized and put inU1.


11

TOMP-GS follows equations (18)-(20). To solve (21), for each i ∈ F k, we form{aH(i)∩Ck} called a

subtreecorresponding toi, which is totally inside the search spaceCk. We orthogonalize each vector in

the subtreeagainst theselected setUk−1 and other vectors in{aH(i)∩Ck} to form theorthogonalized

subtree {aH(i)∩Ck} using the Gram-Schmidt algorithm.

We select thesubtreethat gives the largest projection of the current residual

ik = arg maxi∈Fk

‖Pspan {aH(i)∩Ck

}rk−1‖2, (26)

whereaikis the lowest node on the selectedsubtree. This gives the solution of (21).

The selected setΛk and the new residualrk are updated through (22) and (23). Equation (23) is

performed by deducting from the residual its projection onto the selected subtree:

rk = rk−1 − Pspan {aH(ik)∩Ck

}rk−1. (27)

The selected setUk is updated by adding the selected orthogonalized subtree:

Uk = Uk−1 ∪ {aH(ik)∩Ck}. (28)

We terminate TOMP-GS when the stopping rules discussed in Section IV-A are satisfied.

D. TOMP with Least-Square Implementation

To overcome the computational cost of Gram-Schmidt and the storage cost of huge measurement

matrix A, we introduce another implementation of TOMP in this section. This implementation uses an

iterative method called Least Square (LSQR) [28] to solve for least square solutions, so this version is

called TOMP-Least Square (TOMP-LS).

For large signals such as images, measurement matrixA can be implemented via fast algorithm instead

of matrix multiplication as will be discussed below. Suppose that signaly is measured by collecting a

small number of coefficients in a transform domain:

b = My = SFy, (29)

where F is the transform matrix with a fast algorithm (such as fast Fourier transform), andS is a

selection matrix that collects a small number of coefficients in Fy to form b.

Thus the measurement matrix for sparse vectorx is

A = MW = SFW , (30)

which can be implemented fast using FFT and DWT.


12

In the above scheme, it is only possible to perform multiplications with the whole matrixA. The

followings are some techniques to perform computations on specific columns ofA indexed by the setΛ.

Suppose we want to compute the inner products between a residual r and some columns ofA:

ci = |〈r,ai〉| where i ∈ Λ. (31)

We multiply matrixA by vectorr to get vectorc = Ar, then extract the required valuesci’s at locations

i ∈ Λ in c as

ci = c[i] where i ∈ Λ, c = Ar. (32)

Suppose we want to multiply columns ofA indexed byΛ with a vectorz of the same size asΛ. We

create a zero vectorz of size equal to the height ofA. We copy coefficients ofz to locationsi ∈ Λ in

z and perform the matrix multiplication

AΛz = Az where z[Λ] = z, z[Λ] = 0, (33)

whereAΛ is the matrix containing all the columns ofA indexed byΛ. Vector z[Λ] contains the entries

of z which are not indexed byΛ.

Now we are ready to describe TOMP-LS. Since we do not have direct access to each single column

of A, we only work with theindex setsΛk, Ck, andF k. The notations in this section follow exactly

those in Section IV-A.

In each iterationk (k = 2, 3, . . .), TOMP-LS follows (19)-(20) to form thefinalist setF k. The selection

step (21) is implemented through two steps: For eachi ∈ F k, we use LSQR to compute

t∗i = arg mint

‖AΛk−1∪H(i)t − b‖2, i ∈ F k. (34)

We then select thesubtreethat gives the smallestl2-norm distance to datab, denoted byik, the lowest-

level node on the subtree

ik = arg mini∈Fk

‖AΛk−1∪H(i)t∗i − b‖2. (35)

The selectedΛk set is updated as in (22). The estimation ofx after iterationk is

x(k) = t∗ik, (36)

where x(k) ∈ R|Λk| contains estimated nonzero coefficients ofx at locationsΛk. Hence the updated

residual is

rk = b − AΛkx(k). (37)

TOMP-LS stops when‖rk‖22 ≤ ε. The preselected valueε is determined based on the noise level.


13

V. TREE-BASED MAJORIZATION-M INIMIZATION

A. Majorization-Minimization Approach

A Lagrange multiplier can be used to reformulate the inverseproblem with sparse-tree prior into the

following unconstrained optimization problem:

minx

C(x) where C(x) = ‖Ax − b‖22 + µ · ST (x). (38)

In this section we will follow theMajorization-Minimization(MM) approach [15], [16], [17], [29] in

solving (38). The first step is tomajorizeCx with the following (so called surrogate) function:

Q(x|y) = ‖Ax − b‖22 + λ‖x − y‖2

2

−‖A(x − y)‖22 + µ · ST (x). (39)

By choosingλ bigger than the maximum eigenvalue ofAT A we ensure thatQ(x|y) ≥ C(x) for all

x, y with equality if and only ifx = y. The MM approach amounts to a sequence of estimatex(t) by

iteratively minimizing the majorized function:

x(t+1) = arg minx

Q(x|x(t)). (40)

The MM iteration monotonically decreases the cost functionsince:

C(x(t+1)) ≤ Q(x(t+1)|x(t)) ≤ Q(x(t)|x(t)) ≤ C(x(t)). (41)

As a results, the MM iteration is guaranteed to converge to a local optimal solution. We can rewrite

Q(x|y) in (39) as:

Q(x|y) = λ(‖x‖22 − 2(y + λ−1AT (b − Ay))T x)

+µ · ST (x) + (‖b‖22 + λ‖y‖2

2 − ‖Ay‖22).

(42)

Thus by denoting:

z = x(t) + λ−1AT (b − Ax(t)), (43)

then (40) is equivalent to:

x(t+1) = arg minx

(λ‖x − z‖22 + µ · ST (x)). (44)

Equations (43) and (44) make up the two steps in each iteration of a tree-based majorize-minimize

(TMM) algorithm. The advantage of the TMM algorithm is that it does not need to store the matrixA


14

but instead it only requires fast algorithm to compute matrix multiplications byA andAT for the first

steo (43). In the next section we will develop an efficient algorithm to solve the second step (44).

Notice that the MM approach allows any other regularizationterm to be used instead ofST (x) in

the optimization problem (38). In compressed sensing,‖x‖0 or ‖x‖0 regularization is typical used [15],

[16], [17], [29] for promoting sparse solution. For example, with ‖x‖1 regularization term in place of

ST (x), the second step (44) becomes the component-wise soft thresholding

x(t+1)n = sign(zn) · max{0, |zn| − µ/λ}. (45)

We refer to iteration with (43) and (45) as MM algorithm for inverse problem with sparse prior,

compared to TMM.

B. Fast Best Search in Trees

Let θ = µ/λ. Our task in solving the second step in TMM (44) is to minimizethe following objective

function for a givenz:

J(x) = θ · ST (x) + ‖x − z‖22. (46)

Recall that the sparse-tree prior termST (x) is defined as the number of nodes in the smallest connected

treeT (x) that contains all nonzero entries ofx:

T (x) =⋃

xi 6=0

H(i). (47)

DenoteT c(x) for set remaining of entries inx, and |T (x)| for the number of nodes inT (x). Since

xi = 0 for i ∈ T c(x), we have:

J(x) = θ · |T (x)| +∑

i∈T (x)

(xi − zi)2

+∑

i∈Tc(x)

(xi − zi)2

≥ θ · |T (x)| +∑

i∈Tc(x)

z2i (48)

with equality if and only ifxi = zi for all i ∈ T (x).

Thus solving (44) amounts to searching for the best treeT ∗ starting from the root node that minimizes

J(T ) = θ · |T | +∑

i∈Tc(x)

z2i . (49)


15

The solutionx∗ of (44) can be obtained from the minimizerT ∗ of (49) as

x∗i =

zi if i ∈ T ∗,

0 else.(50)

We now develop a fast dynamic programing algorithm that iterates from the root of the tree to the

leaves to search for the minimizer treeT ∗ of (49). This algorithm is similar to the best basis search

algorithm for wavelet packets [30].

For a noden on the index tree ofz let us denoteτ(n) the set of all trees that has root atn including

the empty tree (denoted by∅) and FT(n) the full tree that has root atn and grows all the way to the

bottom. We define

T ∗(n) = arg minT∈τ(n)

(

θ · |T | +∑

i∈FT(n)\T

z2i

)

, (51)

J∗(n) = θ · |T ∗(n)| +∑

i∈FT(n)\T ∗(n)

z2i , (52)

S(n) =∑

i∈FT(n)

z2i . (53)

The following proposition gives a recursive computation ofthese quantities, from bottom up along tree

branches. The final solution of (49) is found at the root of thetree.

Proposition 1 Let C(n) be the set of children ofn and C(n) = ∅ if n is a leaf node. Then

S(n) = z2n +

∑

c∈C(n)

S(c). (54)

If S(n) < θ +∑

c∈C(n) J∗(c) then

T ∗(n) = ∅, (55)

J∗(n) = S(n). (56)

Otherwise

T ∗(n) = n ∪⋃

c∈C(n)

T ∗(c), (57)

J∗(n) = θ∑

c∈C(n)

J∗(c). (58)

Proof: The best treeT ∗(n) minimizes the following cost function

Jn(T ) = θ · |T | +∑

i∈FT(n)\T

z2i

=∑

i∈T

+∑

i∈FT(n)\T

z2i , (59)


16

among allT ∈ τ(n). SuchT can be either an empty set or a tree that has root atn. In this second case,

T consists ofn and subtrees with root at children ofn:

T = n ∪⋃

c∈C(n)

T c (60)

for someT c ∈ τ(c), and thus

Jn(T ) = θ +∑

c∈C(n)

Jc(T c). (61)

In this case, the costJn(T ) is minimum if each subtreeT c minimizes its costJc(T c). HenceT ∗(n)

is either an empty set orn ∪⋃

c∈C(n) T ∗c . The best treeT ∗(n) is obtained by comparing the costs of

these two possibilities.

The recursive algorithm described in Proposition 1 visits each node inz once where it takes a constant

number of operations. Thus, the complexity of this algorithm for the second step in each TMM iteration

is O(M) whereM is the length ofx or signaly.

VI. EXPERIMENTAL RESULTS

A. TOMP

Throughout this section, the test signals are piecewise-smooth signals as in Figure 4 (a). An 8-level

Daubechies 4 wavelet transform is applied to the test signals to get sparse-tree signals in the wavelet

domain. The signals are then measured by matrices of i.i.d. Gaussian entries (zero mean and variance 1)

of which all columns are normalized to unit norm.

In the first experiment, we count the number of multiplications from inner products performed by

OMP and TOMP-GS at different values ofα and d. The piecewise smooth signal of length 4096 is

reconstructed from 300 measurements. Figure 3 shows that large α and smalld lead to a small number

of multiplications. This is because the size of the searching space is reduced. Whenα ≥ 0.7 and

l ≤ 3, TOMP-GS requires at most about2.9 × 107 multiplications while OMP requires about9.7 × 107

multiplications. Hence the limited search space helps TOMP-GS to reduce computational costs.

Through the experiments using the above test signal and wavelet transform, we observe that TOMP-GS

gives high reconstruction SNR with low computational cost at α = 0.975 andd = 2. From now on, the

TOMP-GS experiments are carried out using these two values.

A single reconstruction case is in shown Figure 4 using 300 measurements from a length-4096 piecewise

smooth signal. All existing methods OMP, StOMP and BP fail toreconstruct the signal, only giving

17.02 dB, 13.10 dB and 11.41 dB, respectively. TOMP-GS and TOMP-LS provide 25.25 dB and 27.72 dB


17

0.5 0.6 0.7 0.8 0.9 10

1

2

3

4

5

6

7

8

9

10x 10

7

Relaxing coeff. alphaN

umbe

r of

mul

tiplic

atio

ns

l = 1l = 2l = 3

OMP: 9.71x107

multiplications

Fig. 3. OMP-GS compared to OMP in the number of multiplications from inner products when varyingα andd.

reconstruction, which improves more 10 dB over existing methods. As shown in Figure 4 (k-l), TOMP-

GS and TOMP-LS can preserve the tree structure of the original signal while existing methods OMP,

StOMP, and BP cannot. Moreover, TOMP-GS requires only about6% of the number of multiplications

required by OMP.

We next reconstruct the piecewise smooth signal of length 4096 from different numbers of measure-

ments using OMP, StOMP, BP, TMP [14], TOMP-GS, and TOMP-LS. For each number of measurements,

we average the reconstruction SNR over 10 trials with different randomly generated measurement matrices

M . We focus in the region of small number of measurements and compare the SNRs resulting from

TOMP and other methods. Figure 5 shows the comparisons whichagain demonstrate significant gains

of the proposed TOMP methods.

The experiments described above demonstrate that TOMP can provide better and faster reconstruction

than methods that only rely on the sparse representations. Like OMP, TOMP-GS incurs the Gram-Schmidt

orthogonalization cost and the storage cost of selected columns ofA. Especially when signals are of

large size, such as images, the measurement matrixA may not be stored in computers. For example,

in reconstructing an image of size256 × 256 from the number of measurements only 10% of its size,

we need a6554 × 65536 matrix A. This costs about 3 GB of memory to storeA in double precision.

Therefore, it is impractical to extract single columns ofA for comparison and orthogonalization and to

store all selected columns. TOMP-GS can be used to reconstruct signals of a few thousand entries in

length. For larger signals, we have to resort to methods likeTOMP-LS and TMM that use fast algorithms

for computing matrix multiplication withA.


18

500 1000 1500 2000 2500 3000 3500 4000

−20

−10

0

10

20

30

40

50

60

(a) Original signal

500 1000 1500 2000 2500 3000 3500 4000

−20

−10

0

10

20

30

40

50

60

(b) OMP (17.02 dB)

500 1000 1500 2000 2500 3000 3500 4000

−20

−10

0

10

20

30

40

50

60

(c) StOMP (13.10 dB)

500 1000 1500 2000 2500 3000 3500 4000

−20

−10

0

10

20

30

40

50

60

(d) BP (11.41 dB)

(e) Original tree (f) OMP tree (g) StOMP tree (h) BP tree

500 1000 1500 2000 2500 3000 3500 4000

−20

−10

0

10

20

30

40

50

60

(i) TOMP-GS (25.25 dB)

500 1000 1500 2000 2500 3000 3500 4000

−20

−10

0

10

20

30

40

50

60

(j) TOMP-LS (27.72 dB)

(k) TOMP-GS tree (l) TOMP-LS tree

Fig. 4. An example piecewise-smooth signal of length 4096 and its reconstructions from 300 linear measurements using OMP,

StOMP, BP, TOMP-GS, and TOMP-LS, together with tree structures showing locations of significant coefficients recoveredby

those methods.

When the Bumps signal of length 4096 is used as test signal, from 600 samples, TOMP-LS returns a

reconstruction SNR of 48.26 dB, far better than StOMP (16.21dB) and BP (11.61 dB).

The size of theselected setin TOMP-LS is not restricted by computation and storage limitations. In

addition, TOMP-LS recomputes the values of all significant entries after each iteration. Whenxi andxj

are nonzero, the corresponding columnsai andaj can be parallel. TOMP-GS can add two of them into

the selected set. However, one of them will be zero due to Gram-Schmidt. TOMP-LS overcomes this


19

0 200 400 600 800 1000 12000

10

20

30

40

50

60

Number of measurementsS

NR

(dB

)

TOMP−LS

TOMP−GS

StOMP

OMP

TMP

BP

Fig. 5. Comparisons between different reconstruction methods of a piecewise-smooth signal using BP, OMP, StOMP, TMP

[14], TOMP-GS, and TOMP-LS.

500 1000 1500 2000 2500 3000 3500 4000

1

2

3

4

5

6

Original Bumps signal

(a) Original signal

500 1000 1500 2000 2500 3000 3500 4000

1

2

3

4

5

6

BP reconstruction from 600 samples; SNR = 11.61 dB

yyhat

(b) BP (11.61 dB)

500 1000 1500 2000 2500 3000 3500 4000

1

2

3

4

5

6

StOMP reconstruction from 600 samples, SNR = 16.2071

yyhat


500 1000 1500 2000 2500 3000 3500 4000

1

2

3

4

5

6

TOMP−LSQR reconstruction from 600 samples, SNR = 48.2577

yyhat

(d) TOMP-LS (48.26 dB)

The original tree

−4.5

−4

−3.5

−3

−2.5

−2

−1.5

−1

−0.5

(e) Original tree

BP reconstructed tree

−4.5

−4

−3.5

−3

−2.5

−2

−1.5

−1

−0.5

(f) BP tree

StOMP reconstructed tree

−4.5

−4

−3.5

−3

−2.5

−2

−1.5

−1

−0.5

(g) StOMP tree

TOMP−LSQR reconstructed tree

−4.5

−4

−3.5

−3

−2.5

−2

−1.5

−1

−0.5

(h) TOMP-LS tree

Fig. 6. The Bumps signal of length 4096 and its reconstructions from 600 linear measurements using BP, StOMP, and TOMP-LS,

together with tree structures showing locations of significant coefficients recovered by those methods.

limitation by LSQR recomputation at each iteration.

When we use the test signals Sinc and Doppler, the tree structures are not well connected as in Figures

7 and 8. However, TOMP-LS still outperforms BP and StOMP. Although TOMP-LS selects insignificant

entries to guarantee the tree structure, their values are set to zero by means of projection. Therefore, the

reconstructed signal is still close to the original.


20

100 200 300 400 500 600 700 800 900 1000

−1

−0.5

0

0.5

1

(a) Original signal

100 200 300 400 500 600 700 800 900 1000

−1

−0.5

0

0.5

1

SNR = 19.07 dB

sshat

(b) BP (19.07 dB)

100 200 300 400 500 600 700 800 900 1000

−1

−0.5

0

0.5

1

SNR = 20.2138

sshat


100 200 300 400 500 600 700 800 900 1000

−1

−0.5

0

0.5

1

SNR = 37.2524

sshat


−3.5

−3

−2.5

−2

−1.5

−1

−0.5

(e) Original tree

−3.5

−3

−2.5

−2

−1.5

−1

−0.5

(f) BP tree

−3.5

−3

−2.5

−2

−1.5

−1

−0.5

(g) StOMP tree

−3.5

−3

−2.5

−2

−1.5

−1

−0.5

(h) TOMP-LS tree

Fig. 7. The Sinc signal of length 1020 and its reconstructions from 400 linear measurements using BP, StOMP, and TOMP-LS,

together with tree structures showing locations of significant coefficients recovered by those methods.

500 1000 1500 2000 2500 3000 3500 4000

−0.6

−0.4

−0.2

0

0.2

0.4

(a) Original signal

500 1000 1500 2000 2500 3000 3500 4000

−0.6

−0.4

−0.2

0

0.2

0.4

SNR = 32.89 dB

s

shat

(b) BP (32.89 dB)

500 1000 1500 2000 2500 3000 3500 4000

−0.6

−0.4

−0.2

0

0.2

0.4

SNR = 45.8881

s

shat


500 1000 1500 2000 2500 3000 3500 4000

−0.6

−0.4

−0.2

0

0.2

0.4

SNR = 58.0531


−2

−1.8

−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

(e) Original tree

−2

−1.8

−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

(f) BP tree

−2

−1.8

−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

(g) StOMP tree

−2

−1.8

−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

(h) TOMP-LS tree

Fig. 8. The Doppler signal of length 4096 and its reconstructions from 700 linear measurements using BP, StOMP, and

TOMP-LS, together with tree structures showing locations of significant coefficients recovered by those methods.

B. TMM

We use TMM to reconstruct a 1-D piecewise-smooth signal of length 4096 from 300 measurements.

The measurement matrixA is of i.i.d. Gaussian entries (zero mean and variance 1). It can be seen from


21

500 1000 1500 2000 2500 3000 3500 4000

−20

−10

0

10

20

30

40

50

60

(a) Original

500 1000 1500 2000 2500 3000 3500 4000

−20

−10

0

10

20

30

40

50

60

(b) MM (9.89 dB)

500 1000 1500 2000 2500 3000 3500 4000

−20

−10

0

10

20

30

40

50

60

(c) TMM (20.03 dB)

(d) Original (e) MM (f) TMM

Fig. 9. An example piecewise-smooth signal of length 4096 and its reconstructions from 300 linear measurements using MM

and TMM, together with tree structures showing locations ofsignificant coefficients and those recovered by those methods.

The original box image

10 20 30 40 50 60

10

20

30

40

50

60

−2 0 2 4 6 8 10 12

(a) Original

MM Resconstructed img, Lambda = 4136.96 Muy = 2068.48

10 20 30 40 50 60

10

20

30

40

50

60

−2 0 2 4 6 8 10 12

(b) MM

Modified MM Resconstructed img, Lambda = 4136.96 Muy = 2068.48

10 20 30 40 50 60

10

20

30

40

50

60

−2 0 2 4 6 8 10 12

(c) TMM

Fig. 10. An example simple box image and its reconstructionsfrom 12 radial scan lines in the frequency domain using MM

and TMM.

Figure 9(e),(f) that TMM using tree-thresholding can recover the tree structure while MM suffers from

noise due to the wrong allocation of significant coefficients. Therefore TMM provides a reconstruction

SNR of 20.03 dB compared to 9.89 dB from MM.

We then use TMM to reconstruct a simple box image and compare the result to that from MM. Using

the scheme in (30), we measure the box image in Figure 10(a) bycollecting coefficients along 12 radial

lines in its frequency domain. We can tell from the results that TMM with tree-thresholding also gives

a better reconstruction than MM in this case.

In the above experiment, the image is small and simple. The Haar wavelet, the simplest type of

wavelet transform, is applied upon the box image. For largersize and more complicated images, we can


22

employ different types of wavelet transforms to get sparse representations which have well-organized tree

structure. In addition, we can also use more complicated thresholding criteria such as different thresholds

for each wavelet level.

VII. C ONCLUDING REMARKS

Current compressed sensing methods employ the sparse property of unknown signals as key prior for

reconstruction. In this paper, we promote the tree structure as an additional important prior to get better

and faster reconstruction. We formulate thesparse-tree inverse problemwith sparse-treeprior.

We introduce two reconstruction algorithms based on the sparse-tree prior, namely the TOMP algorithm

and the TMM algorithm. Experimental results show that the tree-based algorithms provide significant gains

compared to the existing ones in terms of reconstructed signal quality. In particular, TOMP provides better

reconstruction than OMP, StOMP, and BP; while TMM exceeds MM.

The TOMP-GS version bases on Gram-Schmidt can serve as basictool in later mathematic analysis.

To overcome the limitation of signal size and storage cost, the TOMP-LS version is introduced. However,

running time, even much less than existing algorithms, is still an issue of the TOMP algorithm. TMM

appears to be the most promising algorithm in terms of signalsizes and running time.

REFERENCES

[1] Y. Bresler, M. Gastpar, and R. Venkataramani, “Image compression on-the-fly by universal sampling in Fourier imaging

systems,” inIEEE Information Theory Workop on Detection, Estimation, Classification, and Imaging, Santa-Fe, USA,

1999.

[2] M. Vetterli, P. Marziliano, and T. Blu, “Sampling signals with finite rate of innovation,”IEEE Trans. Signal Proc., vol. 50,

no. 6, pp. 1417–1428, Jun. 2002.

[3] E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete

frequency information,”IEEE Trans. Info. Theory, vol. 52, pp. 489–509, Feb. 2006.

[4] D. L. Donoho, “Compressed sensing,”IEEE Trans. Info. Theory, vol. 52, pp. 1289–1306, Apr. 2006.

[5] E. J. Candes and T. Tao, “Near optimal signal recovery from random projections: Universal encoding strategies?”IEEE

Trans. Info. Theory, vol. 52, pp. 5406–5425, Dec. 2006.

[6] J. A. Tropp and A. C. Gilbert, “Signal recovery from partial information via orthogonal matching pursuit,”IEEE Trans.

Info. Theory, vol. 53, pp. 4655–4666, Dec. 2007.

[7] J. M. Shapiro, “Embedded image coding using zerotrees ofwavelet coefficients,”IEEE Transactions on Signal Processing,

Special Issue on Wavelets and Signal Processing, vol. 41, no. 12, pp. 3445–3462, December 1993.

[8] A. Said and W. A. Pearlman, “A new, fast, and efficient image codec based on set partitioning in hierarchical trees,”IEEE

Trans. Circuits and Systems for Video Technology, vol. 6, no. 3, pp. 243–250, Jun. 1996.

[9] M. Crouse, R. D. Nowak, and R. G. Baraniuk, “Wavelet-based signal processing using hidden Markov models,”IEEE

Trans. Signal Proc. (Special Issue on Wavelets and Filterbanks), vol. 46, pp. 886–902, Apr. 1998.


23

[10] M. J. Wainwright, E. P. Simoncelli, and A. S. Willsky, “Random cascades on wavelet trees and their use in modeling and

analyzing natural images,”Journal of Appl. and Comput. Harmonic Analysis, vol. 11, pp. 89–123, Jul. 2001.

[11] A. Cohen, W. Dahmen, I. Daubechies, and R. DeVore, “Treeapproximation and optimal encoding,”Journal of Appl. and

Comput. Harmonic Analysis, vol. 11, pp. 192–226, Dec. 2001.

[12] C. La and M. N. Do, “Signal reconstruction using sparse tree representation,” inProc. of SPIE Conf. on Wavelet Applications

in Signal and Image Processing, San Diego, Aug. 2005.

[13] ——, “Tree-based orthogonal matching pursuit algorithm for signal reconstruction,” inProc. IEEE Int. Conf. on Image

Proc., Atlanta, USA, Oct. 2006.

[14] M. F. Duarte, M. B. Wakin, and R. G. Baraniuk, “Fast reconstruction of piecewise smooth signals from random projections,”

in Proc. of Workshop on Signal Processing with Adaptative Sparse Structured Representations, Rennes, France, Nov. 2005.

[15] I. Daubechies, M. D. Friese, and C. D. Mol, “An iterativethresholding algorithm for linear inverse problems with a sparsity

constraint,”Comm. Pure and Applied Math., vol. 57, pp. 3601–3608, 2004.

[16] M. Figueiredo, J. Bioucas-Dias, and R. Nowak, “Majorization-minimization algorithms for wavelet-based image restora-

tion,” IEEE Trans. on Image Processing, vol. 16, no. 12, pp. 2980–2991, 2007.

[17] M. Elad, B. Matalon, and M. Zibulevsky, “Coordinate andsubspace optimization methods for linear least squares with

non-quadratic regularization,”Journal of Appl. and Comput. Harmonic Analysis, pp. 346–367, 2007.

[18] M. N. Do and C. N. H. La, “Tree-based majorize-minimize algorithm for compressed sensing with sparse-tree prior,” in

Proc. Computational Advances in Multi-Sensor Adaptive Processing, U.S. Virgin Islands, 2007.

[19] R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, “Model-based compressive sensing,” Preprint, 2008.

[20] G. Davis, S. Mallat, and M. Avellaneda, “Greedy adaptive approximation,”J. Constr. Approx., vol. 13, pp. 57–98, 1997.

[21] D. L. Donoho, Y. Tsaig, I. Drori, and J.-L. Starck, “Sparse solution of underdetermined linear equations by stagewise

orthogonal matching pursuit,” vol. 8, Preprint, 2007.

[22] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Preprint,2008.

[23] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensing: Closing the gap between performance and

complexity,” Preprint, 2008.

[24] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,”SIAM J. Sci. Comput., vol. 20,

no. 1, pp. 33–61, 1999.

[25] S. Mallat,A Wavelet Tour of Signal Processing, 2nd ed. Academic Press, 1999.

[26] P. L. Dragotti and M. Vetterli, “Wavelet footprints: Theory, algorithms and applications,”IEEE Trans. Signal Proc., vol. 51,

pp. 1306–1323, May 2003.

[27] A. Barron, J. Rissanen, and B. Yu, “The minimum description length principle in coding and modeling,”IEEE Trans. Info.

Theory, vol. 44, no. 6, pp. 2743–2760, Oct. 1998.

[28] C. C. Paige and M. A. Saunders, “LSQR: An algorithm for sparse linear equations and sparse least squares,”ACM Trans.

Math. Soft., vol. 8, pp. 43–71, 1982.

[29] T. Blumensath and M. E. Davies, “Iterative thresholding for sparse approximations,”Journal of Fourier Analysis and

Applications, vol. 14, pp. 629–654, 2008.

[30] R. R. Coifman and M. V. Wickerhauser, “Entropy-based algorithms for best basis selection,”IEEE Transactions on

Information Theory, Special Issue on Wavelet Transforms and Multiresolution Signal Analysis, vol. 38, no. 2, pp. 713–718,

March 1992.