16
arXiv:1106.5037v1 [cs.IT] 24 Jun 2011 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally Random Matrices Thong T. Do, Student Member, IEEE, Lu Gan, Member, IEEE, Nam H. Nguyen and Trac D. Tran, Senior Member, IEEE Abstract— This paper introduces a new framework of fast and efficient sensing matrices for practical compressive sensing, called Structurally Random Matrix (SRM). In the proposed framework, we pre-randomize a sensing signal by scrambling its samples or flipping its sample signs and then fast-transform the randomized samples and finally, subsample the transform coefficients as the final sensing measurements. SRM is highly relevant for large- scale, real-time compressive sensing applications as it has fast computation and supports block-based processing. In addition, we can show that SRM has theoretical sensing performance comparable with that of completely random sensing matrices. Numerical simulation results verify the validity of the theory as well as illustrate the promising potentials of the proposed sensing framework. Index Terms— compressed sensing, compressive sensing, ran- dom projection, sparse reconstruction, fast and efficient algo- rithm I. I NTRODUCTION C OMPressed sensing (CS) [1], [2] has attracted a lot of interests over the past few years as a revolutionary signal sampling paradigm. Suppose that x is a length-N signal. It is said to be K-sparse (or compressible) if x can be well approximated using only K N coefficients under some linear transform: x α, where Ψ is the sparsifying basis and α is the transform coefficient vector that has K (significant) nonzero entries. According to the CS theory, such a signal can be acquired through the following random linear projection: y x + e, where y is the sampled vector with M N data points, Φ represents a M × N random matrix and e is the acquisition noise. The CS framework is attractive as it implies that x can be faithfully recovered from only M = O(K log N ) measurements, suggesting the potential of significant cost reduction in digital data acquisition. While the sampling process is simply a random linear projection, the reconstruction to find the sparsest signal from the received measurements is highly non-linear process. More This work has been supported in part by the National Science Foundation under Grant CCF-0728893. Thong T. Do, Nam Nguyen and Trac D. Tran are with the Johns Hopkins University, Baltimore, MD, 21218 USA. Lu Gan is with the Brunel University, London, UK. precisely, the reconstruction algorithm is to solve the l 1 - minimization of a transform coefficient vector: min α1 s.t. y = ΦΨα. Linear programming [1], [2] and other convex optimization algorithms [3], [4], [5] have been proposed to solve the l 1 minimization. Furthermore, there also exists a family of greedy pursuit algorithms [6], [7], [8], [9], [10] offering another promising option for sparse reconstruction. These algorithms all need to compute ΦΨ and (ΦΨ) T multiple times. Thus, computational complexity of the system depends on the struc- ture of sensing matrix Φ and its transpose Φ T . Preferably, the sensing matrix Φ should be highly incoherent with sparsifying basis Ψ, i.e. rows of Φ do not have any sparse representation in the basis Ψ. Incoherence between two matrices is mathematically quantified by the mutual coherence coefficient [11]. Definition I.1. The mutual coherence of an orthonormal matrix N × N Φ and another orthonormal matrix N × N Ψ is defined as: µ, Ψ) = max 1i,jN |〈Φ i , Ψ j 〉| where Φ i are rows of Φ and Ψ j are columns of Ψ, respectively. If Φ and Ψ are two orthonormal matrices, ΦΨ j 2 = Ψ j 2 =1. Thus, it is easy to see that for two orthonormal matrices Φ and Ψ , 1/ N µ 1. Incoherence implies that the mutual coherence or the maximum magnitude of entries of the product matrix ΦΨ is relatively small. Two matrices are completely incoherent if their mutual coherence coefficient approaches the lower bound value of 1/ N . A popular family of sensing matrices is a random projection or a random matrix of i.i.d random variables from a sub- Gaussian distribution such as Gaussian or Bernoulli [12], [13]. This family of sensing matrix is well-known as it is universally incoherent with all other sparsifying basis. For example, if Φ is a random matrix of Gaussian i.i.d entries and Ψ is an arbitrary orthonormal sparsifying basis, the sensing matrix in the transform domain ΦΨ is also Gaussian i.i.d matrix. The universal property of a sensing matrix is important because it enables us to sense a signal directly in its original domain without significant loss of sensing efficiency and without any other prior knowledge. In addition, it can be shown that random projection approaches the optimal sensing performance of M = O(K log N ). However, it is quite costly to realize random matrices in practical sensing applications as they require very high

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

arX

iv:1

106.

5037

v1 [

cs.IT

] 24

Jun

201

1IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1

Fast and Efficient Compressive Sensing usingStructurally Random Matrices

Thong T. Do,Student Member, IEEE,Lu Gan,Member, IEEE,Nam H. Nguyen and Trac D. Tran,SeniorMember, IEEE

Abstract— This paper introduces a new framework of fast andefficient sensing matrices for practical compressive sensing, calledStructurally Random Matrix (SRM). In the proposed framewor k,we pre-randomize a sensing signal by scrambling its samplesorflipping its sample signs and then fast-transform the randomizedsamples and finally, subsample the transform coefficients asthefinal sensing measurements. SRM is highly relevant for large-scale, real-time compressive sensing applications as it has fastcomputation and supports block-based processing. In addition,we can show that SRM has theoretical sensing performancecomparable with that of completely random sensing matrices.Numerical simulation results verify the validity of the theory aswell as illustrate the promising potentials of the proposedsensingframework.

Index Terms— compressed sensing, compressive sensing, ran-dom projection, sparse reconstruction, fast and efficient algo-rithm

I. I NTRODUCTION

COMPressed sensing (CS) [1], [2] has attracted a lot ofinterests over the past few years as a revolutionary signal

sampling paradigm. Suppose thatxxx is a length-N signal. Itis said to beK-sparse (or compressible) ifxxx can be wellapproximated using onlyK ≪ N coefficients under somelinear transform:

xxx = ΨΨΨααα,

where ΨΨΨ is the sparsifying basis andααα is the transformcoefficient vector that hasK (significant) nonzero entries.

According to the CS theory, such a signal can be acquiredthrough the following random linear projection:

yyy = ΦΦΦxxx+ eee,

whereyyy is the sampled vector withM ≪ N data points,ΦΦΦrepresents aM × N random matrix andeee is the acquisitionnoise. The CS framework is attractive as it implies thatxxxcan be faithfully recovered from onlyM = O(K logN)measurements, suggesting the potential of significant costreduction in digital data acquisition.

While the sampling process is simply a random linearprojection, the reconstruction to find the sparsest signal fromthe received measurements is highly non-linear process. More

This work has been supported in part by the National Science Foundationunder Grant CCF-0728893.

Thong T. Do, Nam Nguyen and Trac D. Tran are with the Johns HopkinsUniversity, Baltimore, MD, 21218 USA.

Lu Gan is with the Brunel University, London, UK.

precisely, the reconstruction algorithm is to solve thel1-minimization of a transform coefficient vector:

min ‖ααα‖1 s.t. yyy = ΦΦΦΨΨΨααα.

Linear programming [1], [2] and other convex optimizationalgorithms [3], [4], [5] have been proposed to solve thel1minimization. Furthermore, there also exists a family of greedypursuit algorithms [6], [7], [8], [9], [10] offering anotherpromising option for sparse reconstruction. These algorithmsall need to computeΦΦΦΨΨΨ and (ΦΦΦΨΨΨ)T multiple times. Thus,computational complexity of the system depends on the struc-ture of sensing matrixΦΦΦ and its transposeΦΦΦT .

Preferably, the sensing matrixΦΦΦ should be highly incoherentwith sparsifying basisΨΨΨ, i.e. rows ofΦΦΦ do not have anysparse representation in the basisΨΨΨ. Incoherence between twomatrices is mathematically quantified by the mutual coherencecoefficient [11].

Definition I.1. The mutual coherence of an orthonormalmatrix N × N ΦΦΦ and another orthonormal matrixN × NΨΨΨ is defined as:

µ(ΦΦΦ,ΨΨΨ) = max1≤i,j≤N

|〈ΦΦΦi,ΨΨΨj〉|

whereΦΦΦi are rows ofΦΦΦ andΨΨΨj are columns ofΨΨΨ, respectively.

If ΦΦΦ and ΨΨΨ are two orthonormal matrices,‖ΦΦΦΨΨΨj‖2 =‖ΨΨΨj‖2 = 1. Thus, it is easy to see that for two orthonormalmatricesΦΦΦ andΨΨΨ , 1/

√N ≤ µ ≤ 1. Incoherence implies that

the mutual coherence or the maximum magnitude of entriesof the product matrixΦΦΦΨΨΨ is relatively small. Two matricesare completely incoherent if their mutual coherence coefficientapproaches the lower bound value of1/

√N .

A popular family of sensing matrices is a random projectionor a random matrix of i.i.d random variables from a sub-Gaussian distribution such as Gaussian or Bernoulli [12],[13]. This family of sensing matrix is well-known as it isuniversally incoherent with all other sparsifying basis. Forexample, ifΦΦΦ is a random matrix of Gaussian i.i.d entries andΨΨΨ is an arbitrary orthonormal sparsifying basis, the sensingmatrix in the transform domainΦΦΦΨΨΨ is also Gaussian i.i.dmatrix. The universal property of a sensing matrix is importantbecause it enables us to sense a signal directly in its originaldomain without significant loss of sensing efficiency andwithout any other prior knowledge. In addition, it can beshown that random projection approaches the optimal sensingperformance ofM = O(K logN).

However, it is quite costly to realize random matricesin practical sensing applications as they require very high

Page 2: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 2

computational complexity and huge memory buffering dueto their completely unstructured nature [14]. For example,to process a512 × 512 image with64K measurements (i.e.,25% of the original sampling rate), a Bernoulli random matrixrequires nearly gigabytes storage and giga-flop operations,which makes both the sampling and recovery processes veryexpensive and in many cases, unrealistic.

Another class of sensing matrices is a uniformly randomsubset of rows of an orthonormal matrix in which the partialFourier matrix (or the partial FFT) is a special case [13],[14]. While the partial FFT is well known for having fast andefficient implementation, it only works well in the transformdomain or in the case that the sparsifying basis is the identitymatrix. More specifically, it is shown in [[14],Theorem1.1]that the minimal number of measurements required for exactrecovery depends on the incoherence ofΦΦΦ andΨΨΨ:

M = O(µ2nK logN) (1)

whereµn is the normalized mutual coherence:µn =√Nµ

and 1 ≤ µn ≤√N . With many well-known sparsifying

basis such as wavelets, this mutual coherence coefficient mightbe large and thus, resulting in performance loss. Anotherapproach is to design a sensing matrix to be incoherent with agiven sparsifying basis. For example,Noiseletsis designedto be incoherent with the Haar wavelet basis in [15], i.e.µn = 1 whenΦΦΦ is Noiselets transform andΨΨΨ is the Haarwavelet basis. Noiselets also has low-complexity implementa-tion O(N logN) although it is unknown if noiselets is alsoincoherent with other bases.

II. COMPRESSIVESENSING WITH STRUCTURALLY

RANDOM MATRICES

A. Overview

One of remaining challenges for CS in practice is to designa CS framework that has the following features:

• Optimal or near optimal sensing performance: the num-ber of measurements for exact recovery approaches theminimal bound, i.e. on the order ofO(K logN);

• Universality: sensing performance is equally good withalmost all sparsifying bases;

• Low complexity, fast computation and block-based pro-cessing support: these features of the sensing matrix aredesired for large-scale, realtime sensing applications;

• Hardware/Optics implementation friendliness: entries ofthe sensing matrix only take values in the set0, 1,−1.

In this paper, we propose a framework that aims to satisfythe above wish-list, calledStructurally Random Matrix(SRM)that is defined as a product of three matrices:

ΦΦΦ =

√N

MDDDFFFRRR (2)

where:• RRR ∈ N ×N is either a uniform random permutation ma-

trix or a diagonal random matrix whose diagonal entriesRii are i.i.d Bernoulli random variables with identicaldistributionP (Rii = ±1) = 1/2. A uniformly randompermutation matrix scrambles signal’s sample locations

globally while a diagonal matrix of Bernoulli randomvariables flips signal’s sample signs locally. Hence, weoften refer the former as theglobal randomizerand thelatter as thelocal randomizer.

• FFF ∈ N × N is an orthonormal matrix that,in practice,is selected to be fast computable such as popular fasttransforms: FFT, DCT, WHT or their block diagonalversions. The purpose of the matrixFFF is to spreadinformation (or energy) of the signal’s samples over allmeasurements

• DDD ∈ M ×N is a subsampling matrix/operator. The oper-atorDDD selects a random subset of rows of the matrixFFFRRR.If the probability of selecting a rowP (a row is selected)is M/N , the number of rows selected would beM inaverage. In matrix representation,DDD is simply a randomsubset ofM rows of the identity matrix of sizeN ×N .

The scale coefficient√

NM is to normalize the transform

so that energy of the measurement vector is almost similarto that of the input signal vector.

Equivalently, the proposed sensing algorithm SRM contains3 steps:

• Step 1 (Pre-randomize): Randomize a target signal byeither flipping its sample signs or uniformly permutingits sample locations. This step corresponds to multiplyingthe signal with the matrixRRR

• Step 2 (Transform): Apply a fast transformFFF to therandomized signal

• Step 3 (Subsample): randomly pick upM measurementsout of N transform coefficients. This step corresponds tomultiplying the transform coefficients with the matrixDDD

Conventional CS reconstruction algorithm is employed torecover the transform coefficient vectorααα by solving thel1minimization:

ααα = argmin‖ααα‖1 s.t. yyy = ΦΦΦΨΨΨααα. (3)

Finally, the signal is recovered asxxx = ΨΨΨααα. The frameworkcan achieve perfect reconstruction ifxxx = xxx.

From the best of our knowledge, the proposed sensingalgorithm is distinct from currently existing methods suchasrandom projection [16], random filters [17], structured Toeplitz[18] and random convolution [19] via the first step of pre-randomization. Its main purpose is to scramble the structureof the signal, converting the sensing signal into a white noise-like one to achieve universally incoherent sensing.

Depending on specific applications, SRM can offer com-putational benefits either at the sensing process or at thesignal reconstruction process. For applications that allow usto perform sensing operation by computing the completetransformFFF , we can exploit the fast computation of the matrixFFF at the sensing side. However, if it is required to precomputeDDDFFFRRR (and then store it in the memory for future sensingoperation), there would not be any computational benefit atthe sensing side. In this case, we can still exploit the structureof SRM to speed up the signal recovery at the reconstructionside as in mostl1-minimization algorithms [3], majority ofcomputational complexity is spent to compute matrix-vectormultiplicationsAAAuuu and AAATuuu, whereAAA = ΦΦΦΨΨΨ. Note that

Page 3: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 3

bothAAA andAAAT are fast computable if the sparsifying matrixΨΨΨ is fast computable, i.e. their computational complexity onthe order ofO(N logN). In addition, whenFFF is selected tobe the Walsh-Hadamard matrix, the SRM entries only takevalues in the set−1, 1, which is friendly for hardware/opticsimplementation.

The remaining of the paper is organized as follows. Wefirst discuss about incoherence between SRMs and sparsifyingtransforms in Section III. More specifically, Section III-Awillgive us a rough intuition of why SRM has sensing perfor-mance comparable with Gaussian random matrices. Detailquantitative analysis of the incoherence for SRMs with thelocal randomizer and the global randomizer is presented inSection III-B. Based on these incoherence results, theoreti-cal performance of the proposed framework is analyzed inSection IV and then followed by experiment validation inSection V. Finally, Section VI concludes the paper with detaildiscussion of practical advantages of the proposed frameworkand relationship between the proposed framework and otherrelated works.

B. Notations

We reserve a bold letter for a vector, a capital and boldletter for a matrix, a capital and bold letter with one sub-indexfor a row or a column of a matrix and a capital letter withtwo sub-indices for an entry of a matrix. We often employxxx ∈ R

N for the input signal,yyy ∈ RM for the measurement

vector,ΦΦΦ ∈ RM×N for the sensing matrix,ΨΨΨ ∈ R

N×N for thesparsifying matrix andααα ∈ R

N for the transform coefficientvector (xxx = ΨΨΨααα). We use the notation supp(zzz) to indicate theindex set (or coordinate set) of nonzero entries of the vectorzzz. Occasionally, we also useT to alternatively refer to thisindex set of nonzero entries (i.e.,T =supp(zzz)). In this case,zzzTdenotes the portion of vectorzzz indexed by the setT andΨΨΨTdenotes the submatrix ofΨΨΨ whose columns are indexed by thesetT .

Let AAA = FFFRRR andSij , Fij be the entry at theith row andthe jth column ofAAAΨΨΨ andFFF , Rkk be thekth entry on thediagonal of the diagonal matrixRRR, AAAi andΨΨΨj be theith rowof AAA andjth column ofΨΨΨ, respectively.

In addition, we also employ the following notations:

• xn is on the order ofo(zn), denoted asxn = o(zn), if

limn→∞

xn

zn= 0.

• xn is on the order ofO(zn), denoted asxn = O(zn), if

limn→∞

xn

zn= c.

wherec is some positive constant.• A random variableXn is called asymptotically normally

distributedN (0, σ2), if

limn→∞

P (Xn

σ≤ x) =

1√2π

∫ x

−∞e

−y2

2 dy.

III. I NCOHERENCEANALYSIS

A. Asymptotical Distribution Analysis

If ΦΦΦ is an i.i.d Gaussian matrixN (0, 1N ) and ΨΨΨ is an

arbitrarily orthonormal matrix,ΦΦΦΨΨΨ is also i.i.d GaussianmatrixN (0, 1

N ), implying that with overwhelming probability,a Gaussian matrix is highly incoherent with all orthonormalΨΨΨ.In other words, the i.i.d. Gaussian matrix is universally inco-herent with fixed transforms (with overwhelming probability).In this section, we will argue that under some mild conditions,with ΦΦΦ =DDDFFFRRR, whereDDD,FFF ,RRR are defined as in the previoussection, entries ofΦΦΦΨΨΨ are asymptotically normally distributedN (0, σ2), where σ2 ≤ O( 1

N ). This claim is illustrated inFig. 1, which depicts the quantile-quantile (QQ) plots ofentries ofΦΦΦΨΨΨ, whereN = 256, FFF is the 256 × 256 DCTmatrix andΨΨΨ is the Daubechies-8 orthogonal wavelet basis.Fig. 1(a) and Fig. 1(b) correspond to the caseRRR is the localand global randomizer, respectively. In both cases, the QQ-plots appear straight, as the Gaussian model demands.

Note thatΦΦΦ is a submatrix ofAAA = FFFRRR. Thus, asymptoticaldistribution of the entries ofAAAΨΨΨ is similar to that of entriesof ΦΦΦΨΨΨ.

Before presenting the asymptotical theoretical analysis,weintroduce the following assumptions for the local and globalrandomization models.

1) Assumptions for the Local Randomization Model:

• FFF is an N × N unit-norm row matrix with absolutemagnitude of all entries on the order ofO( 1√

N).

• ΨΨΨ is anN×N unit-norm column matrix with the maximalabsolute magnitude of entries on the order ofo(1).

2) Assumptions for the Global Randomization Model:The global randomization model requires similar assumptionsfor the local randomization model plus the following extraassumptions

• The average sum of entries on each column ofΨΨΨ is onthe order ofo( 1√

N).

• Sum of entries on each row ofFFF is zero.• Entries on each row ofFFF and on each column ofΨΨΨ are

not all equal.

Theorem III.1. Let AAA = FFFRRR, whereRRR is the local ran-domizer. Given the assumptions for the local randomizationmodel, entries ofAAAΨΨΨ are asymptotically normally distributedN (0, σ2) with σ2 ≤ O( 1

N ).

Proof. With notations being defined in Section II-B, we have:

Sij = 〈AAAi,ΨΨΨj〉 =N∑

k=1

FikΨkjRkk (4)

DenoteZk = FikΨkjRkk. BecauseRkk are i.i.d Bernoullirandom variables,Zk are i.i.d zero-mean random variableswith E(Zk) = 0. The assumption that|Fik| are on the orderof O( 1√

N) implies that there exist two positive constantsc1

andc2 such that:

c1N

Ψ2kj ≤ Var(Zk) = F 2

ikΨ2kj ≤

c2N

Ψ2kj . (5)

Page 4: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 4

−0.25 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

Normal Quantile N(0,1/N)

Qua

ntile

of I

nput

Sam

ple

(a)

−0.25 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

Normal Quantile N(0,1/N)

Qua

ntile

of I

nput

Sam

ple

(b)

Fig. 1

QQ PLOTS COMPARING DISTRIBUTION OF ENTRIES OFΦΦΦΨΨΨ AND

GAUSSIAN DISTRIBUTION. (A) RRR IS THE LOCAL RANDOMIZER. (B) RRR IS

THE GLOBAL RANDOMIZER. THE PLOTS ALL APPEAR NEARLY LINEAR,

INDICATING THAT ENTRIES OFΦΦΦΨΨΨ ARE NEARLY NORMAL DISTRIBUTED

The variance ofSij , σ2, can be bounded as the follows:

c1N

=c1N

N∑

k=1

Ψ2kj ≤ σ2 =

N∑

k=1

Var(Zk) ≤c2N

N∑

k=1

Ψ2kj =

c2N

.

(6)BecauseSij is a sum of i.i.d zero-mean random variables

ZkNk=1, according to the Central Limit Theorem (CLT)(seeAppendix I),Sij → N (0,O( 1

N )). To apply CLT, we need toverify its convergence condition: for a givenǫ > 0 and thereexistsN that is sufficiently large such that the Var(Zk) satisfy:

Var(Zk) < ǫσ2, k = 1, 2, ..., N. (7)

To show that this convergence condition is met, we use thecounterproof method. Assume there existsǫ0 such that∀N ,there exists at leastk0 ∈ 1, 2, . . . , N:

Var(Zk0) > ǫ0σ2. (8)

From (5), (6) and (8), we achieve:

ǫ0c1N

≤ Var(Zk0) ≤c2N

Ψ2k0j . (9)

This inequality can not be true ifΨk0j is on the order ofo(1). The underlying intuition of the convergence condition isto guarantee that there is no random variable with dominantvariance in the sumSij . In this case, it simply requires thatthere is no dominant entry on each column ofΨΨΨ.

Similarly, we can obtain a similar result whenRRR is auniformly random permutation matrix.

Theorem III.2. Let AAA = FFFRRR, whereRRR is the global ran-domizer. Given the assumptions for the global randomizationmodel, entries ofAAAΨΨΨ are asymptotically normally distributedN (0, σ2), whereσ2 ≤ O( 1

N ).

Proof. Let [ω1, ω2, ..., ωN ] be a uniform random permuta-tion of [1, 2, ..., N ]. Note thatωkNk=1 can be viewed as asequence of random variables with identical distribution.Inparticular, for a fixedk:

P (ωk = i) =1

N, i = 1, 2, ..., N.

DenoteZk = FiωkΨkj (we omit the dependence ofZk on i

andj to simplify the notation), we have:

Sij = 〈AAAi,ΨΨΨj〉 =N∑

k=1

FiωkΨkj =

N∑

k=1

Zk.

Using the assumption that the vectorFFF i has zero average sumand unit norm, we derive:

E(Zk) = ΨkjE(Fiωk) =

Ψkj

N

N∑

j=1

Fij = 0.

and also,

E(Z2k) = Ψ2

kjE(F 2iωk

) =Ψ2

kj

N

N∑

j=1

F 2ij =

Ψ2kj

N.

In addition, note that althoughωkNk=1 have the identicaldistribution, they are correlated random variables because ofthe uniformly random permutationwithout replacement. Thus,with a pair ofk and l such that1 ≤ k 6= l ≤ N , we have:

E(ZkZl) = ΨkjΨljE(FiωkFiωl

)

=ΨkjΨlj

N(N − 1)

1≤p6=q≤N

FipFiq

=ΨkjΨlj

N(N − 1)((

N∑

p=1

Fip)2 −

N∑

p=1

F 2ip)

= − ΨkjΨlj

N(N − 1).

The last equation holds because the vectorFFF i has zeroaverage sum and unit-norm. Then, we derive the expectationand the variance ofSij as follows:

E(Sij) = 0;

Page 5: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 5

Var(Sij) =

N∑

k=1

E(Z2k) +

1≤k 6=q≤N

E(ZkZl)

=1

N

N∑

k=1

Ψ2kj −

1

N(N − 1)

1≤k 6=l≤N

ΨkjΨlj

=1

N− 1

N(N − 1)((

N∑

k=1

Ψkj)2 −

N∑

k=1

Ψ2kj)

=1

N− 1

N(N − 1)((

N∑

k=1

Ψkj)2 − 1)

≤ 1

N+

1

N(N − 1)= O(

1

N).

The forth equations holds because the columnΨΨΨj hasunit-norm. The theorem is then a simple corollary of theCombinatorial Central Limit Theorem [20] (see Appendix1),provided that its convergence condition can be verified thatis:

limN→∞

Nmax1≤k≤N (Fik − Fi)

2

∑Nk=1(Fik − Fi)2

max1≤k≤N (Ψkj −Ψj)2

∑Nk=1(Ψkj −Ψj)2

= 0,

(10)where

Fi =1

N

N∑

k=1

Fik; Ψj =1

N

N∑

k=1

Ψkj .

BecauseFi = 0, ‖Fi‖22 = 1 andmax1≤k≤N F 2ik = O( 1

N ),the equation (10) holds if the following equation holds:

limN→∞

max1≤k≤N (Ψjk −Ψj)2

∑Nk=1(Ψjk −Ψj)2

= 0. (11)

Because|Ψj|Nj=1 are on the order ofo( 1√N):

N∑

k=1

(Ψkj −Ψj)2 = ‖Ψj‖22 −NΨj

2= 1−NΨj

2= O(1).

(12)Also, due to|Ψj | ≤ max1≤k≤N |Ψkj | and |Ψkj | are on theorder ofo(1):

max1≤k≤N

(Ψkj −Ψj)2 ≤ 4 max

1≤k≤NΨ2

kj = o(1). (13)

Combination of (12) and (13) implies (11) and thus theconvergence condition of the Combinatorial Central LimitTheorem is verified.

The condition that each row ofFFF has zero average sumis to guarantee that entries ofFFFΨΨΨ have zero mean whilethe condition that entries on each row ofFFF and on eachcolumn ofΨΨΨ are not all equal is to prevent the degeneratecase that entries ofFFFΨΨΨ might become a deterministic quantity.For example, when entries of a rowFFF i are all equal 1√

N,

Sij = 1√N

∑Nk=1 Ψkj , which is a deterministic quantity, not

a random variable. Note that these conditions are not neededwhenRRR is the local randomizer.

If FFF is a DCT matrix, a (normalized) WHT matrix or a(normalized) DFT matrix, all the rows (except for the firstone) have zero average sum due to the symmetry in thesematrices. The first row, whose entries are all equal1√

N, can

be considered as the averaging row, or a lowpass filtering

operation. When the input signal is zero-mean, this row mightbe chosen or not without affecting quality of the reconstructedsignal. Otherwise, it should be included in the chosen row setto encode the signal’s mean. Lastly, the condition that absoluteaverage sum of every column of the sparsifying basisΨΨΨ areon the order ofo( 1√

N) is also close to the reality because the

majority of columns of the sparsifying basisΨΨΨ can be roughlyviewed as bandpass and highpass filters whose average sumof the coefficients are always zero. For example, ifΨΨΨ is awavelet basis (with at least one vanishing moment), then allcolumns ofΨΨΨ (except one at DC) has column sum of zero.

The aforementioned theorems show that under certain con-ditions, the majority of entries ofAAAΨΨΨ (alsoΦΦΦΨΨΨ) behave likeGaussian random variablesN (0, σ2), where σ2 ≤ O( 1

N ).Roughly speaking, this behavior constitutes to a good sensingperformance for the proposed framework. However, theseasymptotic results are not sufficient for establishing sensingperformance analysis because in general, entries ofAAAΨΨΨ are notstochastically independent, violating a condition of a sensingGaussian i.i.d matrix. In fact, the sensing performance mightbe quantitatively analyzed by employing a powerful analysisframework of a random subset of rows of an orthonormalmatrix [14]. Note thatAAA is also an orthonormal matrix whenRRR is the local or the global randomizer.

Based on the Gaussian tail probability and a union boundfor the maximum absolute value of a random sequence, themaximum absolute magnitude ofAAAΨΨΨ can be asymptoticallybounded as follows:

P ( max1≤i,j≤N

|Sij | ≥ t) 2N2 exp(− t2

2σ2)

whereσ2 ≤ cN andc is some positive constant and stands

for ”asymptotically smaller or equal”, i.e., whenN goes toinfinity, becomes≤.

If we chooset =√

2c log(2N2/δ)N , the above inequality is

equivalent to:

P ( max1≤i,j≤N

|Sij | ≤√

c log 2(N/δ)2

N) 1− δ

which implies that with probability at least1− δ, the mutual

coherence ofAAA andΨΨΨ is upper bounded byO(√

log(N/δ)N ),

which is close to the optimal bound, except thelogN factor.

In the following section, we will employ a more powerfultool from the theory of concentration inequalities to analyzethe coherence betweenAAA = FFFRRR andΨΨΨ whenN is finite. Wealso consider a more general case thatFFF is a sparse matrix(e.g. a block-diagonal matrix).

B. Incoherence Analysis

Before presenting theoretical results for incoherence analy-sis, we introduce assumptions for block-based local and globalrandomization models.

1) Assumptions for the Block-based Local RandomizationModel:

• FFF is anN ×N unit-norm row matrix with the maximalabsolute magnitude of entries on the order ofO( 1√

B),

Page 6: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 6

where1 ≤ B ≤ N , i.e. max1≤i,j≤N |Fij | = c√B

, wherec is some positive constant.

• ΨΨΨ is anN ×N unit-norm column matrix.

2) Assumptions for the Block-based Global RandomizationModel: The block-based global randomization model requiressimilar assumptions for the block-based local randomizationmodel plus the following assumption:

• All rows of FFF have zero average sum.

Theorem III.3. Let AAA = FFFRRR, whereRRR is the local ran-domizer. Given the assumptions for the block-based localrandomization model, then

• With probability at least1 − δ, the mutual coherence of

AAA andΨΨΨ is upper bounded byO(√

log(N/δ)B ).

• In addition, if the maximal absolute magnitude of entriesof ΨΨΨ is on the order ofO( 1√

N), the mutual coherence is

upper bounded byO(√

log(N/δ)N ), which is independent

of B.

Proof. A common proof strategy for this theorem as wellas for other theorems in this paper is to establish a largedeviation inequality that implies the quantity of our interest isconcentrated around its expected value with high probability.Proof steps include:

• Showing that the quantity of our interest is a sum ofindependent random variables;

• Bounding the expectation and variance of the quantity;• Applying a relevant concentration inequality of a sum of

random variables;• Applying a union bound for the maximum absolute value

of a random sequence.

In this case, the quantity of interest is:

Sij = 〈AAAi,ΨΨΨj〉 =∑

k∈supp(FFF i)

FikΨkjRkk

DenoteZk = FikΨkjRkk, for k ∈ supp(FFF i) (in the supportset of the rowFFF i). BecauseRkk are i.i.d Bernoulli randomvariables,Zk are also i.i.d random variables withE(Zk) = 0.Zkk are also bounded becauseZk = ±FikΨkj

Sij is a sum of independent, bounded random variables.Applying the Hoeffding’s inequality (see Appendix 2) yields:

Pr(|Sij | ≥ t) ≤ 2 exp(− t2∑k∈supp(fffi)

F 2ikΨ

2jk

).

The next step is to evaluateσ2 =∑

k∈supp(fff i)F 2ikΨ

2jk. Here,

σ2 can be roughly viewed as the approximation of the varianceof Sij .

σ2 ≤ max1≤i,j≤N

|Fij |2∑

k∈supp(FFF i)

Ψ2kj ≤ max

1≤i,j≤N|Fij |2 =

c

B

(14)If the maximal absolute magnitude of entries ofΨΨΨ is on the

order ofO( 1√N):

max1≤i,j≤N

|Ψij | =c√N

,

wherec is some positive constant, then

σ2 ≤ max1≤i,j≤N

|Ψij |2∑

1≤k≤N

F 2ik ≤ max

1≤i,j≤N|Ψij |2 =

c

N.

(15)Finally, we derive an upper bound of the mutual coherence

µ = max1≤i,j≤N |Sij | by taking a union bound for themaximum absolute value of a random sequence:

P ( max1≤i,j≤N

|Sij| ≥ t) ≤ 2N2 exp(−t2

σ2).

Chooset =√σ2 log(2N2/δ), after simplifying the inequality,

we get:

P ( max1≤i,j≤N

|Sij | ≤√σ2 log(2N2/δ)) ≥ 1− δ.

Thus, with an arbitrarilyΨΨΨ, (14) holds and we achieve thefirst claim of the Theorem:

P ( max1≤i,j≤N

|Sij | ≤√

c log(2N2/δ)

B) ≥ 1− δ.

In the case that (15) holds, we achieve the second claim ofthe Theorem:

P ( max1≤i,j≤N

|Sij | ≤√

c log(2N2/δ)

N) ≥ 1− δ.

RemarkIII.1 . WhenAAA is some popular transform such as theDCT or the normalized WHT, the maximal absolute magnitudeof entries is on the order ofO( 1√

N). As a result, the mutual

coherence ofAAA and anarbitrary ΨΨΨ is upper bounded by

O(√

log(N/δ)N ), which is also consistent with our asymptotic

analysis above. In other words, when at leastΦΦΦ orΨΨΨ is adenseand uniformmatrix, i.e. the maximal absolute magnitude oftheir entries is on the order ofO( 1√

N), their mutual coherence

approaches the minimal bound, except for thelogN factor. Ingeneral, the mutual coherence between an arbitraryΨΨΨ and asparse matrixAAA (e.g. block diagonal matrix of block sizeB)

might be√

NB times larger.

Cumulative coherence is another way to quantify incoher-ence between two matrices [21].

Definition III.1. The cumulative coherence of anN × NmatrixAAA and anN ×K matrixBBB is defined as:

µc(AAA,BBB) = max1≤i≤N

√ ∑

1≤j≤K

〈AAAi,BBBj〉2

whereAAAi andBBBj are rows ofAAA and columns ofBBB, respec-tively.

The cumulative coherenceµc(AAA,BBB) measures theaverageincoherence between two matricesAAA and BBB while mutualcoherenceµ(AAA,BBB) measures the entry-wise incoherence. As aresult, the cumulative coherence seems to be a better indicatorof average sensing performance. In many cases, we are onlyinterested in cumulative coherence betweenAAA andΨΨΨT , whereT is the support of the transform coefficient vector. As willbe shown in the following section, the cumulative coherence

Page 7: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 7

provides a more powerful tool to obtain a tighter bound forthe number of measurements required for exact recovery.

From the definition of cumulative coherence, it is easy toverify that µc ≤

√Kµ. If we directly apply the result of the

Theorem III.3, we obtain a trivial bound of the cumulative

coherence:µc = O(√

K logNB ) for an arbitrary basisΨΨΨ and

µc = O(√

K logNN ) for a dense and uniformΨΨΨ. In fact, we

can get rid of the factorlogN by directly measuring thecumulative coherence from its definition.

Theorem III.4. Let AAA = FFFRRR, whereRRR is the local ran-domizer. Given the assumptions for the block-based localrandomization model, with probability at least1 − δ, thecumulative coherence ofAAA andΨΨΨT , where|T | = K, is upperbounded by 2c√

Bmax(

√K, 4

√log(2N/δ)).

Proof. DenoteUUU = ΨΨΨ∗T andUUUk are columns ofUUU . LetAAAi and

ΨΨΨj (j ∈ T ) be rows ofAAA and columns ofΨΨΨT , respectively.

Si =

√∑

j∈T〈AAAi,ΨΨΨj〉2 = ‖AAAiΨΨΨT ‖2 = ‖

k∈supp(FFF i)

RkkFikUUUk‖2.

DenoteVVV k = FikUUUk andVVV is the matrix of columnsVVV k,k ∈ supp(FFF i). First, we derive upper bound for the Frobeniusnorm ofVVV :

‖VVV ‖2F ≤ max1≤i,j≤N

F 2ij‖U‖2F =

c2K

B.

The last equation holds because‖UUU‖2F = K. Also, thebound for the spectral norm is:

‖V ‖22 = sup‖βββ‖2=1

k∈supp(FFF i)

|〈βββ,VVV k〉|2

= sup‖βββ‖2=1

k∈supp(FFF i)

F 2ik(

K∑

j=1

βββjUkj)2

≤ max1≤i,j≤N

F 2ij sup

‖βββ‖2=1

1≤k≤N

|〈βββ,UUUk〉|2

≤ c2

B‖UUU‖22 =

c2

B.

The last equation holds because‖UUU‖22 = 1. Now, we have:

Si = ‖∑

k∈supp(FFF i)

RkkFikUUUk‖2 = ‖∑

k∈supp(FFF i)

RkkVVV k‖2.

Let us denoteZZZ =∑

k∈supp(FFF i)RkkVVV k.

ZZZ is a Rademacher sum of vectors andSi = ‖ZZZ‖2 isa random variable. To show thatSi is concentrated aroundits expectation, we first derive bound ofE(‖ZZZ‖2). It is easyto verify that for a random variableX , E(X) ≤

√E(X2).

Thus, we will derive the upper bound for the simpler quantityE(‖ZZZ‖22)

E(‖ZZZ‖22) = E(ZZZ∗ZZZ) =∑

k,l∈supp(FFF i)

E(RkkRll)〈VVV k,VVV l〉

=∑

k∈supp(FFF i)

〈VVV k,VVV k〉 = ‖VVV ‖2F =c2K

B.

The third equality holds becauseRkk are i.i.d Bernoullirandom variables and thus,E(RkkRll) = 0 ∀k 6= l. As aresult,

E(Si) = E(‖ZZZ‖2) ≤ c

√K

B.

Applying Ledoux’s concentration inequality of the norm ofa Rademacher sum of vectors [22] (see Appendix 2). Notingthat ‖VVV ‖22 can be viewed as the variance ofSi, yields:

Pr(Si ≥ c

√K

B+ t) ≤ 2 exp(−t2

B

16c2)

Finally, apply a union bound for the maximum absolutevalue of a random process,we obtain:

Pr( max1≤i≤N

Si ≥ c

√K

B+ t) ≤ 2N exp(−t2

B

16c2).

Chooset = 4c√B

√log(2N/δ), we get:

Pr( max1≤i≤N

Si ≥c√B(√K + 4

√log(2N/δ))) ≤ δ.

Finally, we derive:

Pr( max1≤i≤N

Si ≥2c√B

max(√K, 4

√log(2N/δ))) ≤ δ.

Remark III.2 . When K ≥ 16 log(2N/δ), the cumulative

coherence is upper bounded byO(√

KB ). When K ≤

16 log(2N/δ), the upper bound of the cumulative coherence

is O(√

log(N/δ)B ), which is similar to that of the mutual

coherence in Theorem III.3.

Remark III.3 . When FFF is some popular transform such asthe DCT or the normalized WHT, the maximum absolutemagnitude of entries is on the order ofO( 1√

N). As a result,

the cumulative coherence ofAAA and any arbitraryΨΨΨT ,where

|T | = K, is upper bounded byO(√

KN ) if K > 16 log(2Nδ ).

RemarkIII.4 . The above theorem represents the worst-caseanalysis becauseΨΨΨ can be an arbitrary matrix (the worst casecorresponds to the case whenΨΨΨ is the identity matrix). WhenΨΨΨ is known to be dense and uniform, the upper bound ofcumulative coherence, according to the Theorem III.3 and the

fact thatµc ≤ µ√K, is O(

√K logN

N ), which is, in general,

better thanO(√

KB ).

The asymptotical distribution analysis in Section III-Areveals a significant technical difference required for tworandomization models. With the local randomizer, entries ofAAAΨΨΨ are sums ofindependentrandom variables while withthe global randomizer, they are sums ofdependentrandomvariables. Stochastic dependence among random variablesmakes it much harder to set up similar arguments of theirsum’s concentration. In this case, we will show that theincoherence ofAAA andΨΨΨ might depend on an extra quantity,the heterogeneity coefficientof the matrixΨΨΨ.

Page 8: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 8

Definition III.2. AssumeΨΨΨ is anN × N matrix. Let Tk bethe support of the columnΨΨΨk. Define:

ρk =max1≤i≤N |Ψki|√

1|Tk|

∑i∈ Tk

Ψ2ki

. (16)

The column-wise heterogeneity coefficient of the matrixΨΨΨ isdefined as:

ρΨΨΨ = max1≤k≤N

ρk. (17)

Obviously,1 ≤ ρk ≤√|Tk|. ρk illustrates the difference

between the largest entry’s magnitude and the average energyof nonzeroentries. Roughly speaking, it indicates heterogene-ity of nonzero entries of the vectorΨΨΨk. If nonzero entries of acolumnΨΨΨk are homogeneous, i.e. they are on the same orderof magnitude,ρk is on the order of a constant. If all nonzeroentries of a matrix are homogeneous, the heterogeneity coef-ficient is also on the order of a constant,CΨΨΨ = O(1) andΨΨΨis referred as a uniform matrix. Note that a uniform matrix isnot necessarily dense, for example, a block-diagonal matrix ofDCT or WHT blocks

The following theorem indicates that when the global ran-domizer is employed, the mutual coherence betweenAAA andΨΨΨ

is upper-bounded byO(ρΨΨΨ

√log(N/δ)

B ), whereB is the blocksize ofΦΦΦ andΨΨΨ is an arbitrarily matrix with the heterogeneitycoefficientρΨΨΨ.

Theorem III.5. LetAAA = FFFRRR, whereRRR is the global random-izer. Assume thatρk ≥ 4 log(2N2/δ) ∀k ∈ 1, 2, . . . , N,whereρk is defined as in (16). Given the assumptions for theblock-based global randomization model, then

• With probability at least1 − δ, the mutual coherence of

AAA andΨΨΨ is upper-bounded byO(ρΨΨΨ

√log(N/δ)

B ), whereρΨΨΨ is defined as in (17)

• In addition, ifΨΨΨ is dense and uniform, i.e. the maximumabsolute magnitude of its entries is on the order ofO( 1√

N) andB ≥ 4 log(2N2/δ), the mutual coherence is

upper-bounded byO(√

log(N/δ)N ), which is independent

of B.

Proof. Let [ω1, ω2, . . . , ωN ] be a uniformly random permuta-tion of [1, 2, . . . , N ].

Sij = 〈AAAi,ΨΨΨj〉 =N∑

k=1

FiωkΨjk.

As in the proof of the Theorem III.2,ωkNk=1 can beviewed as a sequence of dependent random variables withidentical distribution, i.e. for a fixedk ∈ 1, 2, . . . , N:

P (ωk = i) =1

N, i ∈ 1, 2, . . . , N.

The condition ofFFF is equivalent tomax1≤i,j≤N |Fij | =c√B

, wherec is some positive constant. DefineqkωkNk=1 as

the follows:

qkωk=

√B|Tk|2cρΨΨΨ

FiωkΨjk +

12 if Ψjk 6= 0

0 if Ψjk = 0.

It is easy to verify that0 ≤ qkωk≤ 1. DefineWk as the

sum of dependent random variablesqkωk

Wk =

N∑

k=1

qkωk=

√B|Tk|2cρΨ

N∑

k=1

FiωkΨjk +

|Tk|2

=

√B|Tk|2cρΨ

Sij +|Tk|2

.

Note thatFiωkNk=1 are zero-mean random variables be-

causeFFF i has zero average sum. Thus,E(Sij) = 0 andE(Wk) =

|Tk|2 . Then, applying the Sourav’s theorem of con-

centration inequality for a sum ofdependentrandom variables[23] (see Appendix 2) results in:

P√B|Tk|2cρΨΨΨ

|Sij | ≥ ǫ ≤ 2 exp(− ǫ2

2|Tk|+ 2ǫ).

Denotet = 2cρΨΨΨ√B|Tk|

ǫ. The above inequality is equivalent to:

P|Sij | ≥ t ≤ 2 exp(−B|Tk|4c2ρ2Ψ

t2

2|Tk|+ tcρΨΨΨ

√B|Tk|

).

By choosingt = 4cρΨΨΨ

√1B log(2N

2

δ ), we achieve:

P|Sij | ≥ t ≤ 2 exp(−4|Tk| log(2N

2

δ )

2|Tk|+ 4√|Tk| log(2N2

δ )).

If |Tk| ≥ 4 log(2N2

δ ), the denominator inside the exponentis smaller than4|Tk|. Thus,

P|Sij | ≥ 2cρΨΨΨ

√1

Blog(

2N2

δ) ≤ 2 exp(− log(

2N2

δ)) =

δ

N2.

Finally, after taking the union bound for the maximumabsolute value of a random sequence and simplifying theinequality, we obtain the first claim of the Theorem:

P max1≤i,j≤N

|Sij | ≤ O(ρΨΨΨ

√log(N/δ)

B) ≥ 1− δ.

If ΨΨΨ is known to be dense and uniform, i.e.max1≤i,j≤N |Ψij | = c1√

N, where c1 is some positive

constant. We then defineqkωkNk=1 as the following:

qkωk=

√BN

2cc1FikΨjωk

+ 12 if Fik 6= 0

0 if Fik = 0.

Note that0 ≤ qkωk≤ 1 and E(qkωk

) = B2 . Repeat the

same arguments above, we have:

P|Sij| ≥ t ≤ 2 exp(− NB

4c2c21

t2

2B + tcc1

√NB

).

Similarly, chooset = 4cc1

√1N log(2N

2

δ ), we can derive:

P|Sij | ≥ t ≤ 2 exp(−4B log(2N

2

δ )

2B + 4√B log(2N

2

δ )).

Page 9: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 9

If B ≥ 4 log(2N2

δ ), the denominator inside the exponent issmaller than4B. Thus,

P|Sij | ≥ 2cc1

√1

Nlog(

2N2

δ) ≤ δ

N2.

After taking the union bound of the maximum absolutevalue of a random sequence, we achieve the second claimof the Theorem.

Remark III.5 . The first part of theorem implies that whenFFF is a dense and uniform matrix (e.g. DCT or normalizedWHT) and ΨΨΨ is a uniform matrix (not necessarily dense),the mutual coherence closely approaches the minimum bound

O(√

log(N/δ)N ). Although in this theorem, the mutual coher-

ence depends on the heterogeneity coefficient, one will seein the experimental Section V that this dependence is almostnegligible in practice.

As a consequence of this theorem, when at leastAAA or ΨΨΨis dense and uniform, the mutual coherence ofAAA andΨΨΨ is

roughly on the order ofO(√

logNN ), which is quite close to the

minimal bound 1√N

, except for thelogN factor. Otherwise,the coherence linearly depends on the block sizeB of FFF

and is on the order ofO(√

logNB ). As a matter of fact, this

bound is almost optimal because whenΨΨΨ is the identity matrix,the mutual coherence is actually equal the maximum absolutemagnitude of entries ofAAA, which is on the order ofO( 1√

B).

RemarkIII.6 . Although the theoretical results of the globalrandomizer seem to be always weaker than those of the localrandomizer, there are a few practical motivations to studythis global randomizer. Speech scrambling has been used fora long time for secure voice communication. Also, analogimage/video scrambling have been implemented for commer-cial security related applications such as CCTV surveillancesystem. In addition, permutation does not change the dynamicrange of the sensing signal, i.e. no bit expansion in implemen-tation. The computation cost of random permutation is onlyO(N), which is very easy to implement in software. Froma security perspective the operation of random permutationoffers a large key space than random sign flipping (N ! vs2N ). Also, as will be shown in the numerical experimentsection, with random permutation, one can get highly sparsemeasurement matrix.

IV. COMPRESSIVESAMPLING PERFORMANCEANALYSIS

Section III demonstrates that under some mild conditions,the matrix AAA and ΨΨΨ are highly incoherent, implying thatthe matrixAAAΨΨΨ is almost dense. WhenAAAΨΨΨ is dense, energyof nonzero transform coefficientsαααT is distributed over allmeasurements. Commonly speaking, this is good for signalrecovery from a small subset of measurements because ifenergy of some transform coefficients were concentrated infew measurements that happens to be bypassed in the samplingprocess, there is no hope for exact signal recovery even whenemploying the most sophisticated reconstruction method. Thissection shows that a random subset of rows of the matrixAAA = FFFRRR yields almost optimal measurement matrixΦΦΦ forcompressive sensing.

A. Assumptions for Performance Analysis

A signalxxx is assumed to be sparse in some sparsifying basisΨΨΨ: xxx = ΨΨΨααα, where the vector of transform coefficientsααα hasno more thanK nonzero entries. The sign sequence of nonzerotransform coefficientsαααT which is denoted aszzz, is assumedto be a random vector of i.i.d Bernoulli random variables (i.e.P (zi = ±1) = 1

2 ). Let yyy = ΦΦΦxxx be the measurement vector,

where ΦΦΦ =√

NMDDDFFFRRR is a Structurally Random Matrix.

Assumptions of the block-based local randomization and ofthe block-based global randomization models hold.

B. Theoretical Results

Theorem IV.1. With probability at least1 − δ, the proposedsensing framework can recoverK-sparse signals exactly ifthe number of measurementsM ≥ O(NBK log2(Nδ )). If FFF isa dense and uniform rather than block-diagonal(e.g. DCT ornormalized WHT matrix), the number of measurement neededis on the order ofO(K log2(Nδ )).

Proof. This is a simple corollary of the theorem of Candeset. al. [[14] Theorem1.1] (1) because (i)AAA = FFFRRR is anorthonormal matrix, and (ii) our incoherence results betweenAAA andΨΨΨ in the Theorem III.3 and Theorem III.5.

Remark IV.1. If ΨΨΨ is dense and uniform, the number ofmeasurements for exact recovery is alwaysO(K log2(Nδ ))regardless of the block sizeB. This implies that we can use theidentity matrix for the transformFFF (B = 1). For example, whenthe input signal is known to be spectrally sparse, compressivelysampling it in the time domain is as efficient as in any othertransform domain.

Compared with the framework that uses random projection,there is an upscale factor oflogN for the number of measure-ments for exact recovery. In fact, by employing the bound ofcumulative coherence, we can eliminate this upscale factorandthus, successfully showing optimal performance guarantee.

Theorem IV.2. Assume that the sparsityK > 16 log(2Nδ ).With probability at least1−δ, the proposed framework employ-ing the local randomizer can reconstructK-sparse signals ex-actly if the number of measurementsM ≥ O(NBK log(Nδ )).IfFFF is a dense and uniform matrix (e.g. DCT or normalizedWHT), the minimal number of required measurements isM =O(K log(Nδ )).

Proof. The proof is based on the result of cumulative coher-ence in the Theorem III.4 and a modification of the proofframework of the compressed sensing [14].

DenoteUUU =√

NMFFFRRRΨΨΨ, UUUT =

√NMFFFRRRΨΨΨT , UUUΩ =√

NMDDDFFFRRRΨΨΨ andUUUΩT =

√NMDDDFFFRRRΨΨΨT , where the support

Ω = k|DDDkk = 1, k = 1, 2, .., N. Let vvvk, k ∈ 1, 2, ..., N,be columns ofUUU∗

T . Denoteµc = max1≤k≤N ‖vvvk‖2, where

µc = µc(AAA,ΨΨΨT ) is the cumulative coherence ofAAA =√

NMFFFRRR

andΨΨΨT . According to the above incoherence analysis,µc ≤O(

√KNBM ). Also, denoteµ as the mutual coherence ofAAA and

ΨΨΨT , µ ≤ O(√

N logNBM ).

Page 10: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 10

As indicated in [12], [14], to showl1 minimization exactrecovery, it is sufficient to verify theExact Recovery Principle.

Exact Recovery Principle.With high probability,|πππk| < 1for all k ∈ T c, whereT c is the complementary set of thesetT andπππ = UUU∗

ΩUUUΩT (UUU∗ΩTUUUΩT )−1zzz, wherezzz is the sign

vector of nonzero transform coefficientsαααT .

Note thatπππk = 〈νννk(UUU∗ΩTUUUΩT )−1, zzz〉, whereνννk is thekth

row of UUU∗ΩUUUΩT , for somek ∈ T c. To establish the Exact

Recovery Principle, we will first derive following lemmas. Thefirst lemma is to bound the norm ofνννk.

Lemma IV.1. (Bound the norm ofνννk) With high probability,‖νννk‖ is on the order ofO(µc):

P (‖νννk‖ ≥ µc + aσ) ≤ 3 exp(−γa2),

whereσ, γ and a are some certain numbers.

Proof. Let UUUk be columns ofUUU . For k ∈ T c:

νννk =1

M

N∑

i=1

DiiUikvvvi =

N∑

i=1

(Dii −M

N)Uikvvvi

where the second equality holds because∑N

i=1 Uikvvvi =UUU∗

TUUUk = 0 that results from the orthogonality of columnsof UUU . Let Zi = (Dii − M

N ). BecauseDii are i.i.d binaryrandom variables withP (Dii = 1) = M

N , Zi are zero meani.i.d random variables andE(Z2

i ) =MN (1− M

N ). LetHHH be thematrix of columnshhhi = Uikvvvi, i ∈ 1, 2, . . . , N . Then,νννkcan be viewed as a random weighted sum of column vectorshhhi:

νννk =1

M

N∑

i=1

Zihhhi

and‖νννk‖ is a random variable. We have:

E(‖νννk‖2) =∑

1≤i,j≤N

E(ZiZj)〈WWW i,hhhj〉 =∑

1≤i≤N

E(Z2i )‖WWW i‖2,

where the last equality holds due toE(ZiZj) = 0 if i 6= j.Thus,

E(‖νννk‖2) =M

N(1− M

N)

1≤i≤N

U2ik‖vvvi‖2

≤ M

N(1− M

N)µ2

c

1≤i≤N

U2ik ≤ µ2

c .

where the last inequality holds due to‖UUUk‖2 = NM . This

implies thatE(‖νννk‖) ≤ µc. To show that‖νννk‖ is concen-trated around its mean, we use the Talagrand’s theorem ofconcentration inequality [24]. First, we have:

‖HHH‖22 = sup‖βββ‖=1

N∑

i=1

|〈βββ,hhhi〉|2 = sup‖βββ‖=1

N∑

i=1

U2ik|〈βββ,vvvi〉|2

≤ µ2 sup‖βββ‖=1

N∑

i=1

|〈βββ,vvvi〉|2 = µ2‖UUUT ‖22 =N

Mµ2.

where the last equation holds because‖UUUT ‖22 = NM . Thus, we

derive the upper bound of the varianceσ2:

σ2 = E(Z2k)‖HHH‖22 ≤ M

N(1− M

N)N

Mµ2 ≤ µ2.

In addition, it is obvious that|Zk| ≤ 1 and thus

B = max1≤i≤N

‖hhhi‖2 ≤ µµc.

The Talagrand’s theorem [24] (see Appendix 2) shows that:

P (‖νννk‖−E(‖νννk‖) ≥ t) ≤ 3 exp(−t

cBlog(1+

Bt

σ2 +BE(‖νννk‖))),

wherec is some positive constant. ReplacingE(‖νννk‖), σ2 andB by their upper bounds in the right-hand side, we obtain:

P (‖νννk‖ − E(‖νννk‖) ≥ t) ≤ 3 exp(−t

cµµclog(1 +

µµct

µ2 + µµ2c

)).

The next step is to simplify the right-hand side of the aboveinequality by replacing the denominator inside thelog by twotimes the dominant term and note thatlog(1 + x) ≥ x

2 whenx ≤ 1. In particular, there are two cases:

• Case 1:µµ2c ≥ µ2 or equivalently,µ2

c ≥ µ, denoteσ2 =µµ2

c and t = aσ . If µµct ≤ 2µµ2c or equivalently,a ≤

2(1/µ)12 ,

P (‖νννk‖ − E(‖νννk‖) ≥ t) ≤ 3 exp(−γa2).

• Case 2:µ2 ≥ µµ2c , denoteσ2 = µ2 and t = aσ. If

µµct ≤ 2µ2 or equivalently,a ≤ 2/µc

P (‖νννk‖ − E(‖νννk‖) ≥ t) ≤ 3 exp(−γa2).

whereγ is some positive constant.

In conclusion, letσ =√max(µµ2

c , µ2). Then, for anya ≤

min(2/µc, 2/√µ):

P (‖νννk‖ ≥ µc + aσ) ≤ 3 exp(−γa2), (18)

whereγ is some positive constant.

The second lemma is to bound the spectral norm ofUUU∗

ΩTUUUΩT

Lemma IV.2. (Bound the spectral norm ofUUU∗ΩTUUUΩT )

With high probability,‖UUU∗ΩTUUUΩT ‖ ≥ 1

2

Proof. The Theorem 1.2 in [14] shows that withprobability 1 − δ, ‖UUU∗

ΩTUUUΩT ‖ ≥ 12 if M ≥

µ2c max(c1 logK, c2 log(3/δ)), where c1 and c2 are some

known positive constants.

And the third lemma is to bound the norm ofwwwk =νννk(UUU

∗ΩTUUUΩT )

−1

Lemma IV.3. (Bound the norm ofwwwk = νννk(UUU∗ΩTUUUΩT )

−1)With high probability,‖wwwk‖ is on the order ofO(µc):

P ( supk∈T c

‖wwwk‖ ≥ 2µc+2aσ) ≤ 3N exp(−γa2)+P (‖UUU∗ΩTUUUΩT ‖ ≤ 1

2)

(19)wherea, γ andσ are defined in the proof of the Lemma IV.1.

Proof. Let A be the event that‖UUU∗ΩTUUUΩT ‖ ≥ 1

2 orequivalently,‖(UUU∗

ΩTUUUΩT )−1‖ ≤ 2 andB be the event thatsupk∈T c ‖νννk‖ ≤ µc + aσ. Note that

supk∈T c

‖wwwk‖ ≤ ‖(UUU∗ΩTUUUΩT )

−1‖ supk∈T c

‖νννk‖.

Page 11: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 11

Thus,

P ( supk∈T c

‖wwwk‖ ≥ 2µc + 2aσ) ≤ P (A ∩ B) ≤ P (A) + P (B).

Note thatP (B) ≤ 3N exp(−γa2) implies (19) holds.

To establish the Exact Recovery Principle, we will showthat supk∈T c |〈wwwk, zzz〉| ≤ 1 with high probability. Note thatbecausezzz is assumed to be a vector of i.i.d Bernoulli randomvariables,|〈wwwk, zzz〉| is concentrated around its zero mean. Inparticular, according to the Hoeffding’s inequality:

P (|〈wwwk, zzz〉| ≥ 1) ≤ 2 exp(− 1

2‖wwwk‖2).

⇒ P (|〈wwwk, zzz〉| ≥ 1| supk∈T c

‖wwwk‖ ≤ λ) ≤ 2N exp(− 1

2λ2).

Note that with two arbitrary probabilistic eventsA andB:

P (A) = P (A|B)P (B) + P (A|B)P (B) ≤ P (A|B) + P (B).

Now, let A be the eventsupk∈T c |〈wwwk, zzz〉| ≥ 1 andB bethe eventsupk∈T c ‖wwwk‖ ≤ λ, we derive

P ( supk∈T c

|〈wwwk, zzz〉| ≥ 1) ≤ 2N exp(− 1

2λ2)+P ( sup

k∈T c

‖wwwk‖ ≥ λ).

(20)Chooseλ = 2µc + 2aσ, according to (19) and (20), theprobability of our interestP (supk∈T c |〈wwwk, zzz〉| ≥ 1) is upperbounded by:

3N exp(−γa2) + 2N exp(− 1

2λ2) + δ.

To show thatsupk∈T c |〈wwwk, zzz〉| ≤ 1 with probability1−O(δ), it is sufficient to show that the above upper bound isnot greater than3δ. In particular, choosea2 = γ−1 log(3N/δ)that makes the first term to be equalδ.

To make the second term less thanδ, it is required that

1

2λ2≥ log(

2N

δ). (21)

• Case 1:µ2c ≥ µ. The condition that (18) holds isa ≤

2(1/µ)12 that is equivalent to:

1 ≥ 1

4γ−2µ2 log2(3N/δ).

It is easy to seeµc ≥ aσ, whereσ = (µµ2c)

1/2. In thiscase,λ ≤ 4µc. Thus, (21) holds if

1 ≥ 32µ2c log(

2N

δ). (22)

• Case 2:µ ≥ µ2c . The condition that (18) holds isa ≤

2/µc or equivalently,

1 ≥ 1

4γ−2µ2

c log(3N/δ).

If µc ≥ aσ, whereσ = µ, λ ≤ 4µc and the condition isagain (22). Otherwise,λ ≤ 4aσ. In this case, (21) holdsif

1 ≥ 32γ−1µ2 log(2N

δ).

In conclusion, the Exact Recovery Principle is verified if1 ≥ max(c1µ

2 log2(3N/δ), c2µ2c log(3N/δ)), where c1 and

c2 are known positive constants.Finally, note thatµ2 ≤ O(N logN

BM ) andµ2c ≤ O(NK

BM ) andthe assumption thatK ≥ 16 log(2Nδ ), the sufficient conditionfor exact recovery isM ≥ O(NBK log(Nδ )). WhenFFF is denseand uniform, the condition becomesM ≥ O(K log(Nδ )).

V. NUMERICAL EXPERIMENTS

A. Simulation with Sparse Signals

In this section, we evaluate the sensing performance ofseveral structurally random matrices and compare it with thatof the completely random projection. We also explore theconnection among sensing performance (probability of exactrecovery), streaming capacity (block size ofFFF ) and structureof the sparsifying basisΨΨΨ (e.g. sparsity and heterogeneity).

In the first simulation, the input signalxxx of length N =256 is sparse in the DCT domain, i.e.xxx = ΨΨΨααα, wherethe sparsifying basisΨΨΨ is the 256 × 256 IDCT matrix. Itstransform coefficient vectorααα hasK nonzero entries whosemagnitudes are Gaussian distributed and locations are atuniformly random, whereK ∈ 10, 20, 30, 40, 50, 60. Withthe signalxxx, we generate a measurement vector of lengthM = 128: yyy = ΦΦΦxxx, whereΦΦΦ is some structurally randommatrix or a completely Gaussian random matrix. SRMs underconsideration are summarized in Table I.

The softwarel1-magic [1] is employed to recover the signalfrom its measurementsyyy. For each value of sparsityK ∈10, 20, 30, 40, 50, 60, we repeat the experiment 500 timesand count the probability of exact recovery. The performancecurve is plotted in Fig. 2(a). Numerical values on thex-axisdenote signal sparsityK while those on they-axis denotethe probability of exact recovery. We then repeat similarexperiments when an input signal is sparse in some sparseand non-uniform basisΨΨΨ. Fig. 2(b) and Fig. 2(c) illustratethe performance curves whenΨΨΨ is the Daubechies-8 waveletbasis and the identity matrix, respectively.

There are a few notable observations from these experi-mental results. First, performance of the SRM with the densetransform matrixFFF (all of its entries are non-zero) is inaverage comparable to that of the completely random matrix.Second, performance of the SRM with the sparse transformmatrix FFF , however, depends on the sparsifying basisΨΨΨ ofthe signal. In particular, ifΨΨΨ is dense, the SRM with sparseFFFalso has average performance comparable with the completelyrandom matrix. IfΨΨΨ is sparse, the SRM with sparseFFF oftenhas worse performance the SRM with denseFFF , revealing atrade-off between sensing performance and streaming capacity.These numerical results are consistent with the theoreticalanalysis above. In addition, Fig. 2(b) shows that the SRMwith the global randomizer seems to work much better thanthe SRM with the local randomizer when the sparsifying basisΨΨΨ of the signal is sparse.

B. Simulation with Compressible Signals

In this simulation, signals of interest are natural images ofsize512× 512 such as the512× 512 Lena, Barbara and Boat

Page 12: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 12

TABLE I

SRMS EMPLOYED IN THE EXPERIMENT WITH SPARSE SIGNALS

Notation R FWHT64-L Local randomizer 64× 64 block diagonal WHTWHT64-G Global randomizer 64× 64 block diagonal WHTWHT256-L Local randomizer 256× 256 block diagonal WHTWHT256-G Global randomizer 256× 256 block diagonal WHT

images. The sparsifying basisΨΨΨ used for these natural imagesis the well-known Daubechies9/7 wavelet transform. Allimages are implicitly regarded as 1-D signals of length5122.The GPSR software in [3] is used for signal reconstruction.

For such a large scale simulation, it takes a huge amountof system resources to implement the sensing method of acompletely random matrix. Thus, for the purpose of bench-mark, we adopt a more practical scheme of partial FFT in thewavelet domain (WPFFT). The WPFFT is to sense waveletcoefficients in the wavelet domain using the method of partialFFT. Theoretically, WPFFT has optimal performance as theFourier matrix is completely incoherent with the identitymatrix. The WPFFT is a method of sensing a signal in thetransform domain that also requires substantial amount ofsystem resources. SRMs under consideration are summarizedin Table II.

For the purpose of comparison, we also implement twopopular sensing methods: partial FFT in the time domain(PFFT)[1] and the Scrambled/Permutted FFT (SFFT) in [25],[26] that is equivalent to the dense SRM using the globalrandomizer.

The performance curves of these sensing ensembles areplotted in Fig. 3(a), Fig. 3(b) and Fig. 3(c), which correspondto the input signal Lena, Barbara and Boat images, respec-tively. Numerical value on thex-axis represents sampling rate,which is the number of measurements over the total numberof samples. Value ony-axis is the quality of reconstruction(PSNR in dB). Lastly, Fig. 4 shows the visually reconstructed512 × 512 Boat image from35% of measurements usingWPFFT, WHT32-G and WHT512-L ensembles.

As clearly seen in Fig. 3, the PFFT is not an efficient sensingmatrix for smooth signals like images because Fourier matrixand wavelet basis are highly coherent. On the other hand,the SRM method, which can roughly be viewed as the PFFTpreceded by the pre-randomization process, is very efficient.In particular, with a dense SRM like SFFT, the performancedifference between the SRM method and the benchmarkone, WPFFT, is less than1 dB. In addition, performance ofDCT512-L and WHT512-L that are fully streaming capableSRM, degrades about1.5 dB, which is a reasonable sacrificeas the buffer size required is less than0.2 percent of the totallength of the original signal. Less degradation is obtainablewhen the buffer size is increased. Also, in all cases, there isno observable difference of performance between DCT andnormalized WHT transforms. It implies that orthonormal ma-trices whose entries have the same order of absolute magnitudegenerate comparable performance. In addition, highly sparseSRM using the global randomizer such as DCT32-G and

WHT32-G has experimental performance comparable to thatof the dense SRMs. Note that these SRM are highly sparsebecause their density are only2−13. This observation againverifies that SRM with the global randomizer outperformsSRM with the local randomizer. This might indicate that ourtheoretical analysis for the global randomizer is inadequate. Inpractice, we believe that the global randomizer always worksas well as and even better than the local randomizer. We leavethe theoretical justification of this observation for our futureresearch.

VI. D ISCUSSION ANDCONCLUSION

A. Complexity Discussion

We compare the computation and memory complexity be-tween the proposed SRM and other random sensing matricessuch as Gaussian or Bernoulli i.i.d. matrices. In implemen-tation, the i.i.d Bernoulli matrix is obviously preferred thani.i.d Gaussian one as the former has integer entries1,−1and requires only 1 bit to represent each entry. AM × Ni.i.d. Bernoulli sensing matrix requiresMN bits for storingthe matrix andMN additions and multiplications for sensingoperation. AnM ×N SRM only requires2N +N logN bitsfor storage andN+N logN additions and multiplications forsensing operation. With the SRM method, the computationalcomplexity and memory space required is independent withthe number of measurementsM . Note that with the SRMmethod, we do not need to store matricesDDD, FFF , RRR explicitly.We only need to store the diagonals ofDDD and ofRRR and the fasttransformFFF , resulting in significant saving of both memoryspace and computational complexity.

Computational complexity and running time ofl1-minimization based reconstruction algorithms often dependcritically on whether matrix-vector multiplicationsAAAuuu andAAATuuu can be computed quickly and efficiently (whereAAA =ΦΦΦΨΨΨ) [3]. For the sake of simplicity, assuming thatΨΨΨ isidentity matrix.AAAuuu = ΦΦΦuuu requiresMN = O(KN logN)additions and multiplications for a random sensing matrixΦΦΦand O(N logN) additions and multiplications for the SRMmethod. This implies that at each iteration, SRM can speed upthe reconstruction algorithm with at leastK folds. With com-pressible signals (e.g., images), the number of measurementsacquired tends to be proportional with the signal dimension,for example,M = N/4. In this case, using SRM can achievecomputational complexity reduction with the factor ofN4 logNtimes.

Table III summarizes computational complexity and practi-cal advantages between SRM and a random sensing matrix.

Page 13: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 13

TABLE II

SRMS EMPLOYED IN THE EXPERIMENT WITH COMPRESSIBLE SIGNALS

Notation R FDCT32-G Global randomizer 32× 32 block diagonal DCTWHT32-G Global randomizer 32× 32 block diagonal WHTDCT512-L Local randomizer 512× 512 block diagonal DCTWHT512-L Local randomizer 512× 512 block diagonal WHT

(a) (b)

(c) (d)

Fig. 4

RECONSTRUCTED512 × 512 Boat IMAGES FROMM/N = 35% SAMPLING RATE. (A) THE ORIGINAL BOAT IMAGE ; (B) USING THE WPFFTENSEMBLE:

28.5DB; (C) USING THE WHT32-GENSEMBLE: 28DB; (D) USING THE WHT512-LENSEMBLE: 27.7DB

TABLE III

PRACTICAL FEATURE COMPARISON

Features SRMs Completely Random MatricesNo. of measurements for exact recovery O(K logN) O(K logN)

Sensing complexity N logN O(KN logN)Reconstruction complexity at each iterationO(N logN) O(KN logN)

Fast computability Yes NoBlock-based processing Yes No

Page 14: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 14

B. Relationship with Other Related Works

WhenRRR is the local randomizer, SRM is a little reminiscentto the so-called Fast Johnson-Lindenstrauss Transform (FJLT)[27]. However, SRM employs a simpler matrixDDD. In FJLT,this matrix DDD is a completely random matrix with sparsedistribution. It is unknown if there exists an efficient imple-mentation of such a sparse random matrix. SRM is relevantfor practical applications because of its high performanceandfast computation.

In [25], [26], the Scrambled/Permuted FFT is experimen-tally proposed as a heuristic low-complexity sensing methodthat is efficient for sensing a large signal. To the best of ourknowledge, however, there has not been any theoretical analy-sis for the Scrambled FFT. SRM is a generalized framework inwhich Scrambled FFT is just a specific case, and thus verifyingthe theoretical validity of the Scrambled FFT.

Random Convolution convolving the input signal with a ran-dom pulse followed by randomly subsampling measurementsis proposed in [19] as a promising sensing method for practicalapplications. Although there are a few other methods that ex-ploit the same idea of convolving a signal with a random pulse,for examples: Random Filter in [17] and Toeplitz structuredsensing matrix in [18], only the Random Convolution methodcan be shown to approach optimal sensing performance. Whilesensing methods such as Random Filter and Toeplitz-based CSmethods subsample measurements structurally, the RandomConvolution method subsamples measurements in a randomfashion, a technique that is also employed in SRM. In addition,the Random Convolution method introduces randomness intothe Fourier domain by randomizing phases of Fourier coeffi-cients. These two techniques decouple stochastic dependenceamong measurements and thus, giving the Random Convolu-tion method a higher performance.

SRM is distinct from all aforementioned methods, includingthe Random Convolution one. A key difference is that SRMpre-randomizes a sensing signal directly in its original domain(via the global randomizer or the local randomizer) whilethe Random Convolution method pre-randomizes a sensingsignal in the Fourier domain. SRM also extends the RandomConvolution method by showing that not only Fourier trans-form but also other popular fast transforms, such as DCT orWHT, can be employed to achieve similar high performance.In conclusion, among existing sensing methods, the SRMframework presents an alternative approach to design highperformance, low-complexity sensing matrices with practicaland flexible features.

APPENDIX I

Central Limit Theorem. Let Z1, Z2, . . . , ZN be mutuallyindependent random variables. AssumeE(Zk) = 0 and denoteσ2 =

∑Nk=1 Var(Zk) . If for a givenǫ ≥ 0 andN sufficiently

large, the following inequalities hold:

Var(Zk) < ǫσ2 k = 1, 2, . . . , N

then distribution of the normalized sumS =∑N

k=1 Zk

converges toN (0, σ2)

Combinatorial Central Limit Theorem. Given two se-quencesakNk=1 and bkNk=1. Assume theak are not allequal andbk are also not all equal. Let[ω1, ω2, . . . , ωN ] be auniform random permutation of[1, 2, ..., N ]. DenoteZk = aωk

and

S =

N∑

k=1

Zkbk;

S is asymptotically normally distributedN (E(S),Var(S)) if

limN→∞

Nmax1≤k≤N (Zk − Z)2

∑Nk=1(Zk − Z)2

max1≤k≤N (bk − b)2∑N

k=1(bk − b)2= 0;

where

b =1

N

N∑

k=1

bk and Z =1

N

N∑

k=1

Zk.

APPENDIX II

Hoeffding’s Concentration Inequality. SupposeX1, X2, ..., XN are independent random variables andak ≤ XK ≤ bk (k = 1, 2, ..., N ). Define a new randomvariableS =

∑Nk=1 Xk. Then for anyt > 0

P (|S − E(S)| ≥ t) ≤ 2e− 2t2

∑Nk=1

(bk−ak)2 .

Ledoux’s Concentration Inequality. Let ηi1≤i≤N be asequence of independent random variables such that|ηi| ≤ 1almost surely andvvv1, vvv2,. . . , vvvN be vectors in Banach space.Define a new random variable:S = ‖∑N

i=1 ηivvvi‖. Then forany t > 0,

P (S ≥ E(S) + t) ≤ 2 exp(− t2

16σ2)

where σ2 denote the variance ofS and σ2 =sup‖uuu‖≤1

∑Ni=1 |〈uuu,vvvi〉|2.

Talagrand’s Concentration Inequality. LetZk be zero-meani.i.d random variables and bounded|Zk| ≤ λ and uuuk becolumn vectors of a matrixUUU . Define a new random variable:S = ‖∑N

i=1 Zkuuuk‖. Then for anyt > 0:

P (S ≥ E(S) + t) ≤ 3 exp(− t

cBlog(1 +

Bt

σ2 +BE(S)))

where c is some constant, varianceσ2 = E(Z2k)‖UUU‖2 and

B = λmax1≤k≤N ‖uuuk‖.

Sourav’s Concentration Inequality. Let Zij1≤i,j≤N be acollection of numbers from[0, 1]. Let [ω1, ω2, . . . , ωN ] be auniformly random permutation of[1, 2, . . . , N ]. Define a newrandom variable:S =

∑Ni=1 Ziωi

. Then for anyt ≥ 0

P (|S − E(S)| ≥ t) ≤ 2 exp(− t2

4E(S) + 2t).

Page 15: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 15

REFERENCES

[1] E. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: Exactsignal reconstruction from highly incomplete frequency information,”IEEE Trans. Inf. Theory, vol. 52, pp. 489 – 509, 2006.

[2] D. L. Donoho, “Compressed sensing,”IEEE Trans. Inf. Theory, vol.52, no. 4, pp. 1289 – 1306, 2006.

[3] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, “Gradientprojection for sparse reconstruction,” IEEE J. Sel. Topics SignalProcess., vol. 1, no. 4, pp. 586–597, 2007.

[4] E. T. Hale, W. Yin, and Y. Zhang, “Fixed-point continuation for l-minimization: Methodology and convergence,”SIAM J. Opt., vol. 19,no. 3, pp. 1107–1130, 2008.

[5] E. V. D. Berg and M. P. Friedlander, “Probing the pareto frontier forbasis purusit solutions,”SIAM J. Scien. Comp., vol. 31, no. 2, pp. 890–912, 2008.

[6] J. Tropp and A. Gilbert, “Signal recovery from random measurementsvia orthogonal matching pursuit,”IEEE Trans. Info. Theory, vol. 53,no. 12, pp. 4655–4666, Dec 2007.

[7] D. Needell and J. A. Tropp, “Cosamp: Iterative signal recovery fromincomplete and inaccurate samples,”Appl. Comput. Harmon. Anal., vol.26, pp. 301–321, 2008.

[8] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensingsignal reconstruction,” IEEE Trans. Inf. Theory, vol. 55, no. 5, pp.2230–2249, 2009.

[9] D. L. Donoho, Y. Tsaig, and J.-L. Starck, “Sparse solution of under-determined linear equations by stagewise orthogonal matching pursuit,”Technical Report, 2006.

[10] T. T. Do, L. Gan, N. Nguyen, and T. D. Tran, “Sparsity adaptivematching pursuit algorithm for practical compressed sensing,” AsilomarConf. Sign. Sys. Comput., pp. 581–587, 2008.

[11] D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomicdecomposition,” IEEE Trans. Inf. Theory, vol. 47, no. 7, pp. 2845 –2862, 2001.

[12] E. Candes and T. Tao, “Near optimal signal recovery from randomprojections: Universal encoding strategies?,”IEEE Trans. Inf. Theory,vol. 52, no. 12, pp. 5406 – 5425, 2006.

[13] S. Mendelson, A. Pajor, and N. Tomczak-Jaegermann, “Uniform uncer-tainty principle for bernoulli and subgaussian ensembles,” ConstructiveAlg., vol. 28, pp. 269–283, 2008.

[14] E. Candes and J. Romberg, “Sparsity and incoherence incompressivesampling,” Inverse Problems, vol. 23, no. 3, 2007.

[15] R. Coifman, F. Geshwind, and Y. Meyer, “Noiselets,”Appl. Comput.Harmon. Anal., vol. 10, pp. 27–44, 2001 2005.

[16] E. Candes and T. Tao, “Decoding by linear programming,” IEEE Trans.Inf. Theory, vol. 51, no. 12, pp. 4203–4215, 2005.

[17] J. Tropp, M. Wakin, M. Duarte, D. Baron, and R. Baraniuk,“Randomfilters for compressive sampling and reconstruction,”IEEE Conf. Acous.Speech Sign. Proc., vol. 3, pp. 872–875, 2006.

[18] W. Bajwa, J. Haupt, G. Raz, S. Wright, and R. Nowak, “Toeplitz-structured compressed sensing matrices,”IEEE Stat. Sign. Proc. (SSP),pp. 26–29, 2007.

[19] J. Romberg, “Compressive sensing by random convolution,” SIAM J.Imaging Sci., vol. 2, pp. 1098–1128, 2009.

[20] W. Hoeffding, “A combinatorial central limit theorem,” The AnnalsMath. Stat., vol. 22, pp. 558–566, 1951.

[21] K. Schnass and P. Vandergheynst, “Average performanceanalysis forthresholding,” IEEE Sign. Proc. Letters, vol. 14, no. 11, 2007.

[22] M. Ledoux, “The concentration of measure phenomenon,”AmericanMathematical Society, 2001.

[23] S. Chatterjee, “Stein’s method for concentration inequalities,” Probab.Theory Related Fields, vol. 138, pp. 305–321, 2007.

[24] M. Talagrand, “New concentration inequalities in product spaces,”Invent. Math., vol. 126, pp. 505–563, 1996.

[25] E. Candes, J. Romberg, and T. Tao, “Stable signal recovery fromincomplete and inaccurate measurements,”Comm. Pure Applied Math.,vol. 59, no. 8, 2006.

[26] M. F. Duarte, M. B. Wakin, and R. G. Baraniuk, “Fast reconstruction ofpiecewise smooth signals from incoherent projections,”Workshop Sign.Proc. Adapt. Sparse Struc. Represent., 2005.

[27] N. Ailon and B. Chazelle, “Approximate nearest neighbors and thefast johnson-lindenstrauss transform,”Proc. 38th ACM Symp. TheoryComput., vol. 66, pp. 557 – 563, 2006.

10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Signal Sparsity

Pro

babi

lity

of P

erfe

ct R

ecov

ery

i.i.d Gaussian OperatorWHT256−GWHT256−LWHT32−GWHT32−L

(a)

10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Signal Sparsity K

Pro

babi

lity

of P

erfe

ct R

ecov

ery

i.i.d Gaussian OperatorWHT256−GWHT256−LWHT64−GWHT64−L

(b)

10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Signal Sparsity K

Pro

babi

lity

of P

erfe

ct R

ecov

ery

i.i.d Gaussian OperatorWHT256−GWHT256−LWHT64−GWHT64−L

(c)

Fig. 2

PERFORMANCE CURVES: PROBABILITY OF EXACT RECOVERY VS.

SPARSITYK . (A) WHENΨΨΨ IS IDCT BASIS. (B) WHENΨΨΨ IS DAUBECHIES-8

WAVLET BASIS. (C) WHENΨΨΨ IS THE IDENTITY BASIS

Page 16: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, …IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 1 Fast and Efficient Compressive Sensing using Structurally

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XXX, NO. XXX, XXX 2011 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.810

15

20

25

30

35

40

45

Sampling Rate

PS

NR

(dB

)

R−D performance

WPFFTPFFTSFFTWHT512−LWHT32−GDCT512−LDCT32−G

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.810

15

20

25

30

35

40

Sampling Rate

PS

NR

(dB

)

R−D performance

WPFFTPFFTSFFTWHT512−LWHT32−GDCT512−LDCT32−G

(b)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.810

15

20

25

30

35

40

Sampling Rate

PS

NR

(dB

)

R−D performance

WPFFTPFFTSFFTWHT512−LWHT32−GDCT512−LDCT32−G

(c)

Fig. 3

PERFORMANCE CURVES: QUALITY OF SIGNAL RECONSTRUCTION VS.

SAMPLING RATEM/N . (A) THE 512× 512 LENA IMAGE . (B) THE

512 × 512 BARBARA IMAGE . (C) THE 512 × 512 BOAT IMAGE