17
358 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010 Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property Robert Calderbank, Fellow, IEEE, Stephen Howard, Member, IEEE, and Sina Jafarpour, Member, IEEE Abstract—Compressed Sensing aims to capture attributes of -sparse signals using very few measurements. In the standard compressed sensing paradigm, the measurement matrix is required to act as a near isometry on the set of all -sparse signals (restricted isometry property or RIP). Although it is known that certain probabilistic processes generate matrices that satisfy RIP with high probability, there is no practical algorithm for verifying whether a given sensing matrix has this property, crucial for the feasibility of the standard recovery algorithms. In contrast, this paper provides simple criteria that guarantee that a deterministic sensing matrix satisfying these criteria acts as a near isometry on an overwhelming majority of -sparse signals; in particular, most such signals have a unique representation in the measurement domain. Probability still plays a critical role, but it enters the signal model rather than the construction of the sensing matrix. An essential element in our construction is that we require the columns of the sensing matrix to form a group under pointwise multiplication. The construction allows recovery methods for which the expected performance is sub-linear in , and only quadratic in , as compared to the super-linear com- plexity in of the Basis Pursuit or Matching Pursuit algorithms; the focus on expected performance is more typical of mainstream signal processing than the worst case analysis that prevails in standard compressed sensing. Our framework encompasses many families of deterministic sensing matrices, including those formed from discrete chirps, Delsarte–Goethals codes, and extended BCH codes. Index Terms—Delsarte–Goethals codes, deterministic com- pressed sensing, finite groups, martingale sequences, McDiarmid inequality, statistical near isometry. I. INTRODUCTION AND NOTATIONS T HE central goal of compressed sensing is to capture at- tributes of a signal using very few measurements. In most work to date, this broader objective is exemplified by the im- Manuscript received February 15, 2009; revised October 20, 2009. Current version published March 17, 2010. The work of R. Calderbank and S. Jafar- pour was supported in part by the National Science Foundation (NSF) under Grant DMS 0701226, in part by the Office of Naval Research (ONR) under grant N00173-06-1-G006, and by AFOSR under grant FA9550-05-1-0443. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Jared Tanner. R. Calderbank is with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA (e-mail: [email protected]). S. Howard is with the Australian Defence Science and Technology (DSTO), Edinburgh 5111, Australia. S. Jafarpour is with the Department of Computer Science, Princeton Univer- sity, Princeton, NJ 08544 USA. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSTSP.2010.2043161 portant special case in which a -sparse vector (with large) is to be reconstructed from a small number of linear measurements with . In this problem, the measure- ment data constitute a vector , where is an matrix called the sensing matrix. Throughout this paper, we shall use the notation for the th column of the sensing matrix ; its entries will be denoted by (with label varying from 1 to ). In other words, is the th row and th column element of . The two fundamental questions in compressed sensing are: how to construct suitable sensing matrices , and how to re- cover from efficiently; it is also of practical importance to be resilient to measurement noise and to be able to reconstruct (ap- proximations to) -compressible signals, i.e., signals that have more than nonvanishing entries, but where only entries are significant and the remaining entries are close to zero. The work of Donoho [9] and of Candès, Romberg and Tao [2], [10], [11] provides fundamental insight into the geometry of sensing matrices. This geometry is expressed by, e.g., the Restricted Isometry Property (RIP), formulated by Candès and Tao [10]: a sensing matrix satisfies the -Restricted Isometry Property if it acts as a near isometry on all -sparse vectors; to ensure unique and stable reconstruction of -sparse vectors, it is sufficient that satisfy -RIP. When and/or are (very) small, deterministic RIP matrices have been con- structed using methods from approximation theory [12] and coding theory [13]. More attention has been paid to proba- bilistic constructions where the entries of the sensing matrix are generated by an independent and identically distributed (i.i.d.) Gaussian or Bernoulli process or from random Fourier ensembles, in which larger values of and/or can be considered. These sensing matrices are known to satisfy the -RIP with high probability [9], [10] and the number of measurements is . This is best possible in the sense that approximation results of Kashin [14] and Glushin [15] imply that measurements are required for sparse reconstruction using -minimization methods. Constructions of random sensing matrices of similar size that have the RIP but require a smaller degree of randomness, are given by several approaches including filtering [16], [17] and expander graphs [1], [5], [6], [18]. The role of random measurement in compressive sensing can be viewed as analogous to the role of random coding in Shannon theory. Both provide worst case performance guarantees in the context of an adversarial signal/error model. Random sensing 1932-4553/$26.00 © 2010 IEEE

Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

  • Upload
    s

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

358 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010

Construction of a Large Class of DeterministicSensing Matrices That Satisfy a

Statistical Isometry PropertyRobert Calderbank, Fellow, IEEE, Stephen Howard, Member, IEEE, and Sina Jafarpour, Member, IEEE

Abstract—Compressed Sensing aims to capture attributes of-sparse signals using very few measurements. In the standard

compressed sensing paradigm, the measurement matrix� is required to act as a near isometry on the set of all -sparsesignals (restricted isometry property or RIP). Although it is knownthat certain probabilistic processes generate matrices thatsatisfy RIP with high probability, there is no practical algorithmfor verifying whether a given sensing matrix � has this property,crucial for the feasibility of the standard recovery algorithms. Incontrast, this paper provides simple criteria that guarantee thata deterministic sensing matrix satisfying these criteria acts as anear isometry on an overwhelming majority of -sparse signals;in particular, most such signals have a unique representation inthe measurement domain. Probability still plays a critical role,but it enters the signal model rather than the construction of thesensing matrix. An essential element in our construction is thatwe require the columns of the sensing matrix to form a groupunder pointwise multiplication. The construction allows recoverymethods for which the expected performance is sub-linear in ,and only quadratic in , as compared to the super-linear com-plexity in of the Basis Pursuit or Matching Pursuit algorithms;the focus on expected performance is more typical of mainstreamsignal processing than the worst case analysis that prevails instandard compressed sensing. Our framework encompasses manyfamilies of deterministic sensing matrices, including those formedfrom discrete chirps, Delsarte–Goethals codes, and extended BCHcodes.

Index Terms—Delsarte–Goethals codes, deterministic com-pressed sensing, finite groups, martingale sequences, McDiarmidinequality, statistical near isometry.

I. INTRODUCTION AND NOTATIONS

T HE central goal of compressed sensing is to capture at-tributes of a signal using very few measurements. In most

work to date, this broader objective is exemplified by the im-

Manuscript received February 15, 2009; revised October 20, 2009. Currentversion published March 17, 2010. The work of R. Calderbank and S. Jafar-pour was supported in part by the National Science Foundation (NSF) underGrant DMS 0701226, in part by the Office of Naval Research (ONR) undergrant N00173-06-1-G006, and by AFOSR under grant FA9550-05-1-0443. Theassociate editor coordinating the review of this manuscript and approving it forpublication was Dr. Jared Tanner.

R. Calderbank is with the Department of Electrical Engineering, PrincetonUniversity, Princeton, NJ 08544 USA (e-mail: [email protected]).

S. Howard is with the Australian Defence Science and Technology (DSTO),Edinburgh 5111, Australia.

S. Jafarpour is with the Department of Computer Science, Princeton Univer-sity, Princeton, NJ 08544 USA.

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSTSP.2010.2043161

portant special case in which a -sparse vector (withlarge) is to be reconstructed from a small number of linearmeasurements with . In this problem, the measure-ment data constitute a vector , where is an

matrix called the sensing matrix. Throughout this paper,we shall use the notation for the th column of the sensingmatrix ; its entries will be denoted by (with labelvarying from 1 to ). In other words, is the th row andth column element of .

The two fundamental questions in compressed sensing are:how to construct suitable sensing matrices , and how to re-cover from efficiently; it is also of practical importance to beresilient to measurement noise and to be able to reconstruct (ap-proximations to) -compressible signals, i.e., signals that havemore than nonvanishing entries, but where only entries aresignificant and the remaining entries are close to zero.

The work of Donoho [9] and of Candès, Romberg and Tao[2], [10], [11] provides fundamental insight into the geometryof sensing matrices. This geometry is expressed by, e.g., theRestricted Isometry Property (RIP), formulated by Candès andTao [10]: a sensing matrix satisfies the -Restricted IsometryProperty if it acts as a near isometry on all -sparse vectors;to ensure unique and stable reconstruction of -sparse vectors,it is sufficient that satisfy -RIP. When and/orare (very) small, deterministic RIP matrices have been con-structed using methods from approximation theory [12] andcoding theory [13]. More attention has been paid to proba-bilistic constructions where the entries of the sensing matrixare generated by an independent and identically distributed(i.i.d.) Gaussian or Bernoulli process or from random Fourierensembles, in which larger values of and/or can beconsidered. These sensing matrices are known to satisfy the

-RIP with high probability [9], [10] and the number ofmeasurements is . This is best possible in the sensethat approximation results of Kashin [14] and Glushin [15]imply that measurements are required for sparsereconstruction using -minimization methods. Constructionsof random sensing matrices of similar size that have the RIP butrequire a smaller degree of randomness, are given by severalapproaches including filtering [16], [17] and expander graphs[1], [5], [6], [18].

The role of random measurement in compressive sensing canbe viewed as analogous to the role of random coding in Shannontheory. Both provide worst case performance guarantees in thecontext of an adversarial signal/error model. Random sensing

1932-4553/$26.00 © 2010 IEEE

Page 2: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

CALDERBANK et al.: CONSTRUCTION OF A LARGE CLASS OF DETERMINISTIC SENSING MATRICES THAT SATISFY A STATISTICAL ISOMETRY PROPERTY 359

TABLE IPROPERTIES OF �-SPARSE RECONSTRUCTION ALGORITHMS THAT EMPLOY RANDOM SENSING MATRICES WITH � ROWS AND � COLUMNS. THE PROPERTY RIP-1

IS THE COUNTERPART OF RIP FOR THE � METRIC AND IT PROVIDES GUARANTEES ON THE PERFORMANCE OF SPARSE RECONSTRUCTION ALGORITHMS THAT

EMPLOY LINEAR PROGRAMMING [1]. NOTE THAT EXPLICIT CONSTRUCTION OF THE EXPANDER GRAPHS REQUIRES A LARGE NUMBER OF MEASUREMENTS, AND

THAT MORE PRACTICAL ALTERNATIVES ARE RANDOM SPARSE MATRICES WHICH ARE EXPANDERS WITH HIGH PROBABILITY

matrices are easy to construct, and are -RIP with high proba-bility. As in coding theory, this randomness has its drawbacks,briefly described as follows.

• First, efficiency in sampling comes at the cost of com-plexity in reconstruction (see Table I) and at the cost oferror in signal approximation (see Section V).

• Second, storing the entries of a random sensing matrix mayrequire significant space, in contrast to deterministic ma-trices where the entries can often be computed on the flywithout requiring any storage.

• Third, there is no algorithm for efficiently verifyingwhether a sampled sensing matrix satisfies RIP, a con-dition that is essential for the recovery guarantees of theBasis Pursuit and Matching Pursuit algorithms on anysparse signal.These drawbacks lead us to consider constructions withdeterministic sensing matrices, for which the performanceis guaranteed in expectation only, for -sparse signals thatare random variables, but which do not suffer from thesame drawbacks. The framework presented here provides:— easily checkable conditions on special types of de-

terministic sensing matrices guaranteeing successfulrecovery of all but an exponentially small fraction of

-sparse signals;— in many examples, the entries of these matrices can be

computed on the fly without requiring any storage;— recovery algorithms with lower complexities than Basis

Pursuit and Matching Pursuit algorithms.To make this last point more precise, we note that Basis Pursuitand Matching Pursuit algorithms rely heavily on matrix-vectormultiplication, and are super-linear with respect to , the di-mension of the data domain. The reconstruction algorithm forthe framework presented here (see Section V) requires onlyvector-vector multiplication in the measurement domain; as aresult, its recovery time is only quadratic in the dimensionof the measurement domain. We suggest that the role of the de-terministic measurement matrices presented here for compres-

sive sensing is analogous to the role of structured codes in com-munications practice: in both cases fast encoding and decodingalgorithms are emphasized, and typical rather than worst caseperformance is optimized. We are not the only ones seeking in-spiration in coding theory to construct deterministic matrics forcompressed sensing; Table II gives an overview of approachesin the literature that employ deterministic sensing matrices, sev-eral of which are based on linear codes (cf. [19] and [21]) andprovide expected-case rather than worst-case performance guar-antees. It is important to note (see Table II) that although the useof linear codes makes fast algorithms possible for sparse recon-struction, these are not always resilient to noise. Such non-re-silience manifests itself in, e.g., Reed–Solomon (RS) construc-tions [21]; the RS reconstruction algorithm (the roots of whichgo back to 1795!—see [26] and [27]) uses the input data to con-struct an error-locator polynomial; the roots of this polynomialidentify the signals appearing in the sparse superposition. Be-cause the correspondence between the coefficients of a poly-nomial and its roots is not well conditioned, it is very difficultto deal with compressible signals and noisy measurements inRS-based approaches.

Because we will be interested in expected-case performanceonly, we need not impose RIP; we shall instead work with theweaker Statistical Restricted Isometry Property. More precisely,we define the following.

Definition 1 ( -StRIP Matrix): An (sensing)matrix is said to be a -Statistical Restricted Isom-etry Property matrix [abbreviated -StRIP matrix] if, for

-sparse vectors , the inequalities

(1)

hold with probability exceeding (with respect to a uniformdistribution of the vectors among all -sparse vectors inwith the same fixed magnitudes).1

1Throughout the paper norms without subscript denote � -norms

Page 3: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

360 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010

TABLE IIPROPERTIES OF �-SPARSE RECONSTRUCTION ALGORITHMS THAT EMPLOY DETERMINISTIC SENSING MATRICES WITH � ROWS AND � COLUMNS. NOTE THAT FOR

LDPC CODES � � �. NOTE ALSO THAT RIP HOLDS FOR RANDOM MATRICES WHERE IT IMPLIES EXISTENCE OF A LOW-DISTORTION EMBEDDING FROM � INTO

� . GURUSWAMI et al. [18] PROVED THAT THIS PROPERTY ALSO HOLDS FOR DETERMINISTIC SENSING MATRICES CONSTRUCTED FROM EXPANDER CODES. IT

FOLLOWS FROM THEOREM 8 IN THIS PAPER THAT SENSING MATRICES BASED ON DISCRETE CHIRPS AND DELSARTE–GOETHALS CODES SATISFY THE UStRIP

There is a slight wrinkle in that, unlike the simple RIP case,StRIP does not automatically imply unique reconstruction,not even with high probability. If an matrix is

-StRIP, then, given a -sparse vector , it does followthat maps any other randomly picked -sparse signal toa different image, i.e., , with probability exceeding

(with respect to the random choice of ). This doesnot mean, however, that uniqueness is guaranteed with highprobability: requiring that the measure of is -sparseand there is a different -sparse for whichbe small, is a more stringent requirement than that the measureof and be small for all -sparse .For this reason, we also introduce the following definition.

Definition 2 ( -UStRIP Matrix): An (sensing)matrix is said to be a -Uniqueness-guaranteedStatistical Restricted Isometry Property matrix [abbreviated

-UStRIP matrix] if is a -StRIP matrix, and

with probability exceeding (with respect to a uniformdistribution of the vectors among all -sparse vectors inof the same norm).

Again, we are not the first to propose a weaker version of RIPthat permits the construction of deterministic sensing matrices.The construction by Guruswami et al. in [18] can be viewedas another instance of a weakening of RIP, in the followingdifferent direction. RIP implies that defines a low-distortion

- -embedding that plays a crucial role in the proofs of [2],and [9]–[11]. In [18], Guruswami et al. prove that this - -em-bedding property also holds for deterministic sensing matricesconstructed from expander codes. These matrices satisfy an “al-most Euclidean null space property” that is for any in the nullspace of , is bounded by a constant ; this istheir main tool to obtain the results reported in Table II.

In this paper, we formulate simple design rules, imposing thatthe columns of the sensing matrix form a group under pointwise

multiplication, that all row sums vanish, that different rows areorthogonal, and requiring a simple upper bound on the absolutevalue of any column sum (other than the multiplicative iden-tity). The properties we require are satisfied by a large class ofmatrices constructed by exponentiating codewords from a linearcode; several examples are given in Section II. In Section III, weshow that our relatively weak design rules are sufficient to guar-antee that is UStRIP, provided the parameters satisfy certainconstraints. The group property makes it possible to avoid in-tricate combinatorial reasoning about coherence of collectionsof mutually unbiased bases (cf. [28]). Section IV applies our re-sults to the case where the sensing matrix is formed by takingrandom rows of the fast Fourier transform (FFT) matrix. InSection V, we emphasize a particular family of constructions in-volving subcodes of the second order Reed–Muller code; in thiscase codewords correspond to multivariable quadratic functionsdefined over the binary field or the integers modulo 4. Section VIprovides a discussion regarding the noise resilience.

II. StRIP-ABLE: BASIC DEFINITIONS, WITH

SEVERAL EXAMPLES

In this section, we formulate three basic conditions and giveexamples of deterministic sensing matrices with rows and

columns that satisfy these conditions. Note that throughoutthe paper, we shall assume (without stating this again explicitly)that has no repeated columns.

Definition 3: An -matrix is said to be —StRIP-able, where satisfies , if the following three condi-tions are satisfied.

• (St1) The rows of are orthogonal, and all the row sumsare zero. i.e.,

if (2)

for all (3)

Page 4: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

CALDERBANK et al.: CONSTRUCTION OF A LARGE CLASS OF DETERMINISTIC SENSING MATRICES THAT SATISFY A STATISTICAL ISOMETRY PROPERTY 361

• (St2) The columns of form a group under “pointwisemultiplication”, defined as follows:

for all there exists a

such that for all (4)

In particular, there is one column of for which all the en-tries are 1, and that acts as a unit for this group operation;this column will be denoted by . Without loss of gener-ality, we will assume the columns of are ordered so that

, i.e., for all .• (St3) For all

(5)

Remarks:1) Condition (5) applies to all columns except the first column

(i.e., the column which consists of all ones).2) The justification of the name StRIP-able will be given in

the next section.3) When the value of in (5) does not play a special role,

we just do not spell it out explicitly, and simply callStRIP-able.

The conditions (2)–(5) have the following immediate conse-quences.

Lemma 4: If the matrix satisfies (4), then , forall and all .

Proof: For every , is a group of com-plex numbers under multiplication; all finite groups of this typeconsist of unimodular numbers.

Lemma 5: If the matrix satisfies (4), then the collection ofcolumns of is closed under complex conjugation, i.e., for all

, there exists a

such that, for all (6)

Proof: Pick . Since the columns ofform a group under pointwise multiplication, there is some

such that is the inverse of for thisgroup operation. Using Lemma 4, we have then, for all ,

.Lemma 6: If the matrix satisfies (2)–(4), then the normal-

ized columns form a tight frame in ,with redundancy .

Proof: By Lemma 4 and (2), we have

i.e., so that, for any vector

Lemma 7: If the matrix satisfies (4), then theinner product of two columns and , defined as

, equals if and only if .Proof: If , we obviously have , by

Lemma 4.If , then we have, by Cauchy–Schwarz,

implying that in this instance the Cauchy–Schwarz inequalitymust be an equality, so that must be some multiple of .Since , the multiplication factor must equal 1, sothat . Since has no repeated columns, follows.

We shall prove that StRIP-able matrices have (as their namealready announces) a Restricted Isometry Property in a Statis-tical sense, provided the different parameters satisfy certain con-straints, which will be made clear and explicit in the next sec-tion. Before we embark on that mathematical analysis, we showthat there are many examples of StRIP-able matrices.

A. Discrete Chirp Sensing Matrices

Let be a prime and let be a primitive (complex) rootof unity. A length chirp signal takes the form

where

where is the base frequency and is the chirp rate .Consider now the family of chirp signals where

; the “extra” phase factor (usually notpresent in chirps) ensures that the row sumsvanish for all . It is easy to check that this family satisfies(St1), (St2), and (St3) [23]. For the corresponding sensingmatrix , Applebaum et al. [23] have analyzed an algorithmfor sparse reconstruction that exploits the efficiency of the FFTin each of two steps: the first to recover the chirp rate andthe second to recover the base frequency. The GerschgorinCircle Theorem [29] is used to prove that the RIP holds forsets of columns. Numerical experiments reportedin [23] compare the eigenvalues of deterministic chirp sensingmatrices with those of random Gaussian sensing matrices . Thesingular values of restrictions to -dimensional subspaces of

random Gaussian sensing matrices have a Gaussiandistribution, with mean and standard deviation ;the experiments show that, for the same values of , , and

, the singular values of restrictions of deterministic chirpsensing matrices have a similar spread around a central value

that is closer to 1; in fact, the experimentssuggest that .

B. Kerdock, Delsarte–Goethals and Second Order ReedMuller Sensing Matrices

In our construction of deterministic sensing matrices basedon Kerdock, Delsarte–Goethals, and second-order Reed Mullercodes, we start by picking an odd number . The rows ofthe sensing matrix are indexed by the binary -tuples , and

Page 5: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

362 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010

the columns are indexed by the pairs , where isan binary symmetric matrix in the Delsarte–Goethalsset , and is a binary -tuple. The entry isgiven by

(7)

where denotes the main diagonal of , and denotesthe Hamming weight (the number of 1s in the binary vector).Note that all arithmetic in the expressionsand takes place in the ring of integersmodulo 4, since they appear only as exponents for . Given

and , the vector is a codeword in the Del-sarte–Goethals code (defined over the ring of integers modulo4) For a fixed matrix , the columnsform an orthonormal basis that can also be obtained bypostmultiplying the Walsh–Hadamard basis by the unitarytransformation .

The Delsarte–Goethals set is a binary vector spacecontaining binary symmetric matrices with the prop-erty that the difference of any two distinct matrices has rank atleast (see [30]). The Delsarte–Goethals sets are nested

The first set is the classical Kerdock set, and thelast set is the set of all binary symmetricmatrices. The Delsarte–Goethals sensing matrix is deter-mined by and has rows andcolumns. The initial phase in (7) is chosen so that the Del-sarte–Goethals sensing matrices satisfy (St1) and (St2) (seeAppendix A).

Coherence between orthonormal bases and indexedby binary symmetric matrices and is determined by the rank

of the binary matrix (See Appendix A). Any vectorin one of the orthonormal bases has inner product of absolutevalue with vectors in the other basis and is orthogonalto the remaining basis vectors. The column sums in thisDelsarte–Goethals sensing matrix satisfy

so that condition (St3) is trivially satisfied. Details are providedin Appendix A; we refer the interested reader to [30]–[32] and[33, Ch. 15] for more information about subcodes of the second-order Reed–Muller code.

C. BCH Sensing Matrices

The Carlitz-Uchiyama Bounds (See [33, Ch. 9 of ]) impl ythat the interval

contains all nonzero weights in the dual of the extended binaryBCH code of length and designed dis-tance , with the exception of . Set-ting , the columns of the BCHsensing matrix are obtained by exponentiating the codewordsin . The column determined by the codeword isgiven by

where

and where is any vector not orthogonal to . Conditions(St1) and (St2) hold by construction and

so that (St3) holds. These sensing matrices have been analyzedby Ailon and Liberty [34].

In the binary case, the column sums take the formwhere is the Hamming weight of the exponentiated codeword,and a similar interpretation is possible for codes that are linearover the ring of integers modulo 4 (see [30]). Property (St3) con-nects the Hamming geometry of the code domain, as capturedby the weight enumerator of the code, with the geometry of thecomplex domain.

III. IMPLICATIONS FOR DETERMINISTIC StRIP-ABLE

SENSING MATRICES: MAIN RESULT

In this section, we prove our main result, namely that ifsatisfies (St1), (St2), and (St3), then is UStRIP, under certainfairly weak conditions on the parameters. More precisely:

Theorem 8: Suppose the matrix is-StRIP-able, and suppose and

. Then there exists a constant such that, if, then is -UStRIP with

.The proof of Theorem 8 has two parts: we shall first, in

Section III-A, prove that is StRIP; when this is establishedwe turn our attention to proving UStRIP in Section III-B.

A. Proving StRIP

1) Setting up the Framework: It will be convenient to decom-pose the random process generating the vectors as follows:first pick (randomly) the indices of the nonzero entries of ,and then the values of those entries. For the first step, we picka random permutation of ; the

numbers will then be the indices of the non-van-ishing entries of . Next, we pick random values ;these will be the nonzero values of the entries of the vector .Computing expectations with respect to can be decomposedlikewise; when we average over all possible choices of , butnot yet over the values of the random variables , we

Page 6: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

CALDERBANK et al.: CONSTRUCTION OF A LARGE CLASS OF DETERMINISTIC SENSING MATRICES THAT SATISFY A STATISTICAL ISOMETRY PROPERTY 363

shall denote such expectations by , adding a subscript. Westart by proving the following.

Lemma 9: For , , as described above and, we have

Proof: With the notations introduced above, the entries ofare given by

. We have then

(8)

where .The first term in (8) is independent of ; it just equals

.For the second term, we have

with

with

(9)

By (4) and Lemma 6, we havefor some appropriate ; if , then

, so that .As ranges over all possible permutations of , the

index (with ) will range (uniformly) over allpossible values (i.e., excluding 1). It follows that, for

(10)

where we have made use of a counting argument in the firstequality, and of (3) in the second. It then follows that

with

Applying the Cauchy–Schwarz inequality, we obtain

with

Combining this with the previous equality gives

It then suffices to substitute this into (8) to prove the Lemma.Remark 10: By using the Cauchy–Schwarz inequality in the

last step of the proof of 9 we may have sacrificed quite a bit,especially if the nonvanishing entries in differ appreciably inorder of magnitude. Without this step, the final inequality wouldbe

with

(11)

To prove the concentration of around its expected value,we will make use of a version of the McDiarmid inequality [35]based on concentration of martingale difference random vari-ables with distinct values (as opposed to independent values forthe standard McDiarmid inequality). In what follows, uppercaseletters denote random variables, lowercase letters denote valuestaken on by these random variables.

Theorem 11 (Self-Avoiding McDiarmid Inequality): Letbe probability spaces and define as the proba-

bility space of all distinct -tuples2. In other words, the setis the subset of the product set given by

(12)

where the probability measure on is just the renormalization(so as to be a probability measure) of the restriction to of thestandard product measure on .

Let be a function from the set to , such thatfor any coordinate , given , as shown in (13) at thetop of the next page, where the expectations are taken over therandom variables (conditioned on taking valuesthat are all different from each other and from as

2We follow a widespread custom, and denote by the same letter both the setcarrying the probability measure, and the probability space [i.e., the triplet (set,�-algebra of measurable sets, measure)]. We shall specify which is meant whenconfusion could be possible.

Page 7: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

364 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010

(13)

well as (first expectation) or (second expectation). Then forany positive

(14)

Proof: See Appendix B.Proof of StRIP: We are now ready to start the proof.

Proof (of the -StRIP Property, Claimed in The-orem 8): Let denote the set of all -tupleswhere is a permutation of . It followsfrom the definition that all entries of each element of are dis-tinct. The set is finite; equipped with the counting measure,renormalized so as to have total mass 1, is the probabilityspace of the nonzero entries of the random signal : the

, corresponding to (uniformly) randomly pickedpermutations of , are random variables distributeduniformly in . For , we denote by the

-tuple of random variables .Given values , let be defined

by , and by. Clearly,

(15)

Our strategy of proof will be the following. We want to upperbound . From Lemma 9 we knowthat is close to . This suggests that we investigate,for , the function defined by

. This lastexpression is exactly of the type for which the Self-Avoiding

McDiarmid Inequality gives upper bounds, provided we can es-tablish first that satisfies the required conditions of the Self-Avoiding McDiarmid inequality. Deriving such a bound is thusour first step.

From (15) and Lemma 2.2 we get

with

We have then the equation shown at the bottom of thepage. Here we have used the same notation as in the proofof Lemma 9, i.e., . Because

and are both in ,the indices and are all different. It thenfollows from (5) that

with

with

(16)

where we have used that if , i.e., if. Because this bound is uniform over the in , and

since , it is now clear that this implies the sufficientcondition of the Self-Avoiding McDiarmid inequality, withgiven by

with

with

with

Page 8: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

CALDERBANK et al.: CONSTRUCTION OF A LARGE CLASS OF DETERMINISTIC SENSING MATRICES THAT SATISFY A STATISTICAL ISOMETRY PROPERTY 365

Since

with

where we have used the Cauchy–Schwarz inequality in thepenultimate step, we can conclude from Theorem 11 that

After substituting for , and applying Lemma 9, we finallyobtain that

For , we can set ,thus recovering the StRIP-bound claimed in the state-ment of Theorem 8 for this case: with probability at least

, we have thefollowing near-isometry for -sparse vectors :

(17)

Remark 12: Equation (17) implies that as long as, the probability of failure (i.e., the probability

that the near-isometry inequality fails to hold) drops to zero as. In particular, if equals 1, for

some constant less than one, and thenthe probability of failure approaches zero at the rate .

Remark 13: Fig. 1 shows the distribution of condition num-bers for the singular values of restrictions of the sensing matrixto sets of K columns. Two cases are considered; the Reed Mullermatrices constructed in Section II-B and random Gaussian ma-trices of the same size. The figure suggests that the decay of

is similar for both types of compressive sensing matrices.Remark 14: Note that similar to the case of random and ex-

pander matrices, the number of measurements grows as the

Fig. 1. Mean and standard deviation for the condition number of �-Gram ma-trices for� , with� � �, compared to that of a Gaussian random matrix ofthe same size.

inverse square of the distortion parameter , , as.

Remark 15: By avoiding the use of the Cauchy–Schwarz in-equality at the end of the proof, and making use of Remark 10,one can sharpen the bounds. From (16) it follows that with prob-ability at least

and

This implies (set , ), as shown in theequation at the bottom of the page, or, equivalently, in the nextequation at the bottom of the next page, with ,as above, and if , otherwise. Theworst case for this bound is when , in which case werecover the bound in Theorem 8; if one is restricted, for whateverreason, to -sparse vectors that are known to have some entriesthat are much larger than other non vanishing entries, then themore complicated bound given here is tighter.

Remark 16: If the sparsity level is greater than , then. However, since the deterministic sensing ma-

trices of Section II structurally require the condition ,a deterministic matrix with rows and

columns is required. In this case, the sensingmatrix is constructed by choosing random columns fromthe deterministic matrix.

Page 9: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

366 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010

Fig. 2. Number of successful reconstructions in 1000 trials versus the sparsityfactor � for the deterministic Kerdock sensing matrix corresponding to� � �.

B. Proving UStRIP: Uniqueness of Sparse Representation

Although we have established the desired near-isometrybounds, we still have to address the Uniqueness guarantee;unlike the standard RIP case, this does not follow automaticallyfrom a StRIP bound, as pointed out in the Introduction. Moreprecisely, we need to estimate the probability that a randomlypicked -sparse vector has an “evil twin” that mapsto the same image under , i.e., , and prove thatthis probability is very small. If is the unionof possible support sets of a two -sparse vectors, that is, if

, then we define to be the matrixobtained by picking out only the columns indexed by labels in

. In other words, the matrix elements of are thosefor which , with varying over its full range. There willbe two different -sparse vectors , the supports of whichare both contained in , if and only if the matrixis rank-deficient (where denotes conjugate transpose of ).Note that this property concerns the support set only—thevalues of the entries of are not important. This is similarto the discussion of sparse reconstruction when satisfies adeterministic Null Space Property [12]. Once uniqueness isfound to be overwhelmingly likely, we can derive from it theprobability that decoding algorithms (such as the quadraticdecoding algorithms described in Section V) succeed in con-structing, from , a faithfully exact or close copy (dependingon the application) of the -sparse source vector .

In fact, it turns out that we will not even have to considermatrices with ; as we shall see below, it suffices toconsider for sets of cardinality up to .

Once again, condition (St3) will play a crucial role. For theStRIP analysis, in the previous subsection, it sufficed to take

, where is the parameter that measures the closenessof column sums in (St3). In this subsection, we will impose anonzero lower bound on ; we shall see that suffices forour analysis.

We recall here the formulation of (St3): for any column ofthe sensing matrix, with

We introduced the notation at the start of this subsection.We shall also use the special case where we wish to restrictthe sensing matrix to a single column indexed by ; in thatcase, we denote the restriction by . Finally, we denote the

conjugate transpose of a matrix by .We shall use Tropp’s argument (see Section VII of [Tro08b])

to prove uniqueness of sparse representation; to apply this ar-gument we first need to prove that a random submatrix hassmall coherence with the remaining columns of the sensing ma-trix. Again, we use probabilistic methods to prove the low co-herence property. First, we show that this property is valid forthe expected value of the random submatrix , and then usingthe Self Avoiding McDiarmid inequality, we derive high con-centration bounds for the random variable around its expectedvalue.

Lemma 17: Let be -StRIP-able with ,and assume that the conditions , and

hold, and is as defined in Theorem

8. Let be a fixed column of , and let bethe positions of the first elements of a random permutation of

. Then

(18)

where the expectation is with respect to the choice of the set .Proof: By linearity of expectation we have

(19)Since the set of columns of is invariant under complex con-jugation, and forms a group under pointwise multiplication, wehave

where we use again the notation introduced just below (9):. As ranges over all the possible

permutations that do not move , ranges uniformly over

Page 10: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

CALDERBANK et al.: CONSTRUCTION OF A LARGE CLASS OF DETERMINISTIC SENSING MATRICES THAT SATISFY A STATISTICAL ISOMETRY PROPERTY 367

, and the different rangeuniformly over .

Hence,

(20)

where we have used (St1).Next, we use the Self-Avoiding McDiarmid inequality, to-

gether with property (St3) to derive a uniform bound for the

random variable :

Theorem 18: Let be -StRIP-able with , andassume that the conditions , and

hold, define as in Theorem 8, and letbe a set of random columns of . Then with probability at

least , there exists no such that

(21)

Proof: The proof is in several steps. In the first step, wepick any , and keep it fixed (for the time being).Let

where we assume that are different ele-ments of , picked at random. Note thatif is a random permutation of , then

. The function , asdefined above from

to , is information-theoretically indistinguishable from thefunction from the permutations of todefined by

We have computed in Lemma 17; in order to applythe Self-Avoiding McDiarmid Inequality to , we need verifyonly that a necessary condition of the Self-Avoiding McDiarmidinequality holds.

When we subtract from, only the th term survives; we

have

(22)

by (St3), since . It immediately fol-lows that the concentration condition holds for , with

. Therefore the Self-Avoiding McDiarmid Inequality holdsfor , which means it also holds for : for any positive

All this was for one fixed choice of ; note that the bound doesnot depend on the identity of . This implies that by applyingunion bounds over the possible choices for the column of

, we get that the probability that there exists a such that

is at most . Writing in terms of com-pletes the proof.

If , the right-hand side of (21) re-duces to

and then if , then (for sufficiently small , and suf-ficiently large ) a choice of random columns of has avery high probability of having small coherence with any othercolumn of the matrix; in particular, we have, with probabilityexceeding , that

(23)

This establishes incoherence between the random submatrixand the remaining columns of the sensing matrix.

Page 11: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

368 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010

We can now complete the UStRIP proof by following an argu-ment of Tropp [36]; for completeness we include the argumenthere.

Lemma 19: Let be a set of in-dices sampled uniformly from . Assume that is

-StRIP. Let be any other subset of of sizeless than or equal to . Then, with probability at least(with respect to the randomness in the choice of )

(24)

Proof: First, note that we need check only the case, since otherwise (24) is immediate. Note

also that, because is a tight-frame with bounded worst casecoherence , it follows from Theorems B and D of [36] thatthe probability that the randomly picked setsatisfies

is at least . (The notation stands for the identity matrixon ; this just amounts to restating the -StRIP conditionin matrix form.) It follows that, with probability at least

(25)

where is the smallest singular value of . Since, has at least one index not in . Denote that index

by . Since the entries of the matrix are all unimodular, we have

(26)

Let be the orthogonal projection operator on the rangeof . We shall prove (24) by showing that ,which implies that there exists a vector in the range of thatis outside the range of . Note that

(27)

Since is -StRIP, we have, still with probability atleast

where the penultimate inequality is by (23).Theorem 20: Let be -StRIP-able with , and

assume that the conditions , and

hold, define as in Theorem 8, and letbe a randomly picked -sparse signal. Then with probability at

least (with respect to the random choice of ), is theonly -sparse vector that satisfies the equation .

Proof: We have already proved in Section III-B5 that is-StRIP. We start by recalling that the random choice

of can be viewed as first choosing its support, a uniformlydistributed subset of size within , and then, oncethe support is fixed, choosing a random vector within the cor-responding -dimensional vector space. For this last choice nodistribution has been specified; we shall just assume that it isabsolutely continuous with respect to the Lebesgue measure on

or .It follows from Theorems B and D in [36] that with proba-

bility exceeding is non-singular, so that

with probability exceeding . The near-isometry prop-erty of implies that no two signals with support can havethe same value in the measurement domain. If there neverthe-less were a vector such that , the support of

would therefore necessarily be different from . By Lemma19, we know that is at most

-dimensional. It follows that in order to possibly havean “evil twin” , the vector must itself lie in the at most

-dimensional space that is the inverse image of under. This set, however, has measure zero with respect to any

measure that is absolutely continuous with respect to the -di-mensional Lebesgue measure. Thus, for each -set for which

is a near-isometry, the vectors that are not uniquely deter-mined by their image , constitute a set of measure zero.Since randomly chosen -sets produce restrictions that arenear-isometric with probability exceeding , the theorem isproved.

Combining Remark 12 with Theorem 20 completes the proofof Theorem 8.

IV. PARTIAL FOURIER ENSEMBLES

In partial Fourier ensembles, the matrix is formed by uni-form random selection of rows from the discrete Fouriertransform matrix. The resulting random sensing matrices arewidely used in compressed sensing, because the correspondingmemory cost is only , in contrast to thecost of storing Gaussian and Bernoulli matrices. Moreover, it isknown [7], [10] that if , then with overwhelmingprobability, the partial Fourier matrix satisfies the RIP property.It is easy to verify that such satisfies the Conditions (St1), and(St2). We now show that it also satisfies Condition (St3) almostsurely.

Note that here in contrast to the proof of Theorem 8, therandomness is with respect to the choice of the rows fromthe discrete Fourier transform matrix. We show that with over-whelming probability, the condition (St3) is satisfied for everycolumn of this randomly sampled matrix. First, fix a columnother than the identity column, and define the random variable

to be the value of the entry , where the randomnessis with respect to the choice of the rows of (that is with re-spect to the choice of ). Since the rows are chosen uniformly

Page 12: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

CALDERBANK et al.: CONSTRUCTION OF A LARGE CLASS OF DETERMINISTIC SENSING MATRICES THAT SATISFY A STATISTICAL ISOMETRY PROPERTY 369

at random, and the column sums (for all but the first column) inthe discrete Fourier transform are zero, we have

(28)

Since all entries are unimodular, we may apply Hoeffding’s in-equality to both the real and the imaginary part of the randomvariable , then apply union bounds to conclude thatfor all

Applying union bounds to all admissible columns we get

there exists a column average greater than (29)

is at most . Hence, with probability at least

all column averages are , and all column

sums are less than , so that condition (St3) is indeedsatisfied. Applying Theorem 8 we see that a partial Fourier ma-trix satisfies StRIP with only measurements. This im-proves upon the best previous upper bound of obtainedin [10] and helps explain why partial Fourier matrices work wellin practice.

V. QUADRATIC RECONSTRUCTION ALGORITHM

Algorithm 1 Quadratic Reconstruction Algorithm.

Input: dimensional vector

Output: An approximation to the signal

1: Set , , .

2: for or while do3: for each entry to do4: pointwise multiply with a shift (offset) of itself as

in (30).

5: end for6: Compute the fast Walsh–Hadamard transform of the

pointwise product: Equation (31)

7: Find the position of the next peak in the Hadamard

domain: Equation (32) implies that the chirp-like cross

terms appear as a constant background signal.

8: if then9: Restore .

10: end if

11: Update which minimizes

.

12: Add to entry of .

13: Set .

14: Set .

15: end for

The Quadratic Reconstruction Algorithm [23]–[25], de-scribed in detail above, takes advantage of the multivari-able quadratic functions that appear as exponents in Del-sarte–Goethals sensing matrices. It is this structure that enablesthe algorithm to avoid the matrix-vector multiplication requiredwhen Basis and Matching Pursuit algorithms are applied torandom sensing matrices. Because our algorithm requires onlyvector-vector multiplication in the measurement domain, thereconstruction complexity is sublinear in the dimension ofthe data domain. The Delsarte–Goethals sensing matrix wasintroduced in Section II-B: there are rows indexed bybinary -tuples , and columns indexed bypairs , where is a binary symmetric matrix and isa binary -tuple. The first step in our algorithm is pointwisemultiplication of a sparse superposition

with a shifted copy of itself. The sensing matrix is obtained byexponentiating multivariable quadratic functions so the first stepproduces a sparse superposition of pure frequencies (in the ex-ample below, these are Walsh functions in the binary domain)against a background of chirp-like cross terms:

(30)

Then the (fast) Hadamard transform concentrates the energyof the first term at (no more than)

Walsh–Hadamard tones, while the second term distributes en-ergy uniformly across all tones. The Fourier coefficient is

(31)and it can be shown (see [25]) that the energy of the chirp-likecross terms is distributed uniformly in the Walsh–Hadamard do-main. That is for any coefficient

(32)

Equation (32) is related to the variance of and may be viewedas a fine-grained concentration estimate. In fact the proof of (32)mirrors the proof of the UStRIP property given in Section III;first we show that the expected value of any Walsh–Hadamardcoefficient is zero, and then we use the Self-Avoiding McDi-armid Inequality to prove concentration about this expectedvalue. The Walsh–Hadamard tones appear as spikes above aconstant background signal and the quadratic algorithm learnsthe terms in the sparse superposition by varying the offset .These terms can be peeled off in decreasing order of signalstrength or processed in a list. The quadratic algorithm is arepurposing of the chirp detection algorithm commonly usedin navigation radars which is known to work extremely well

Page 13: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

370 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010

in the presence of noise. Experimental results show close ap-proach to the information theoretic lower bound on the requirednumber of measurements. For example, as demonstrated inFig. 2, numerical experiments show that the quadratic decodingalgorithm is able to reconstruct greater than 40-sparse superpo-sitions when applied to deterministic Kerdock sensing matriceswith and . In this case, the informationtheoretic lower bound is [24].

We now explain how the StRIP property provides perfor-mance guarantees for the Quadratic Reconstruction Algorithm.At each iteration the algorithm returns the location of oneof the significant entries and an estimate for the value ofthat entry. The StRIP property guarantees that the estimateis within of the true value. These errors compound as thealgorithm iterates, but since the chirp cross-terms and noiseare uniformly distributed in the Walsh–Hadamard domain, theerror in recovery is bounded by the difference between the truesignal and its best -term approximation . More precisely,if is -StRIP, if the position of the significant entriesare chosen uniformly at random, if the near-zero entries andthe measurement noise come from a Gaussian distribution,and if the Quadratic Recovery Algorithm is used to recover anapproximation for , then with overwhelming probability

(33)

The role of the StRIP property is to bound the error of approx-imation in Step 11 of the Quadratic Reconstruction Algorithm.Note that if it were somehow possible to identify the support of

beforehand, then the UStRIP property would guarantee thatwe would be able to recover the signal values by linear regres-sion. However identifying the support of a -sparse signal isknown to be almost as hard as full reconstruction, and that iswhy our algorithm finds location and estimates signal value si-multaneously, and does so one location at a time.

Note that the error bound is of the form

(34)

This bound is tighter than bounds of random ensembles[2], and of expander-based methods [6].

VI. RESILIENCE TO NOISE

A. Noisy Measurements

In this section, we consider deterministic sensing matricessatisfying the hypothesis of Theorem 8, and show resilienceto i.i.d. Gaussian noise that is uncorrelated with the measuredsignal. Note we have introduced the square of in (35)merely to simplify the notation in the proof. (This could be,for instance, picked so that , where has the samemeaning as in Theorem 8.)

Theorem 21: Let and be such that

(35)

with probability exceeding , and let, where the noise samples are i.i.d. complex Gaussian

random variables with zero mean and variance . Then, for

(36)

with probability greater than , where

Proof: First consider the probability that exceeds theupper bound in (36). Setting , we have

The estimate for is similar, and thedesired bound then follows from the union bounds.

B. Noisy Signals

If the signal is contaminated by white Gaussian noise thenthe measurements are given by

(37)

where is complex multivariate Gaussian distributed, with zeromean and covariance

(38)

The reconstruction algorithm thus needs to recover the signalfrom the noisy measurements

(39)

where is complex multivariate Gaussian dis-tributed with mean zero and covariance

(40)

The deterministic compressive sensing schemes consideredin this paper have some advantage over random compressivesensing schemes in that

and consequently , i.e., the noisesamples on distinct measurements are independent. One canthus use the estimates of the previous subsection again. Noiseof this type is of course harder to deal with; this is illustratedhere by the measurement variance being a (possibly huge)factor larger than the source noise variance .

Page 14: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

CALDERBANK et al.: CONSTRUCTION OF A LARGE CLASS OF DETERMINISTIC SENSING MATRICES THAT SATISFY A STATISTICAL ISOMETRY PROPERTY 371

VII. CONCLUSION

We have provided simple criteria, that when satisfied by adeterministic sensing matrix, guarantee successful recoveryof all but an exponentially small fraction of k-sparse signals.These criteria are satisfied by many families of deterministicsensing matrices including those formed from subcodes of thesecond order binary Reed Muller codes. The criteria also applyto random Fourier ensembles, where they improve knownbounds on the number of measurements required for sparsereconstruction. Our proof of unique reconstruction uses aversion of the classical McDiarmid Inequality that may be ofindependent interest.

We have described a reconstruction algorithm for ReedMuller sensing matrices that takes special advantage of thecode structure. Our algorithm requires only vector-vectormultiplication in the measurement domain, and as a result,reconstruction complexity is only quadratic in the number ofmeasurements. This improves upon standard reconstructionalgorithms such as Basis and Matching Pursuit that requirematrix-vector multiplication and have complexity that is super-linear in the dimension of the data domain.

APPENDIX APROPERTIES OF DELSARTE-GOETHALS SENSING MATRICES

First we prove that the columns of the Delsarte–Goethalssensing matrix form a group under pointwise multiplication.

Proposition A.1: Let be the set of columnvectors where

for

where and where the binary symmetric matrix variesover the Delsarte–Goethals set . Then is a group oforder under pointwise multiplication.

Proof: We have

where is used to emphasize addition in . Writewhere is a binary symmetric matrix.

Observe that , where the diagonalis a pointwise product of and .

Thus, see equation at the bottom of the page, and is closedunder pointwise multiplication. Hence the possible inner prod-ucts of columns are exactly the possible columnsums for columns , where .

Next we verify property (St3).Proposition A.2: Let be a binary symmetric ma-

trix with rank and let . If

then either or

where

Proof: We have

Changing variables to and gives

Since the diagonal of a binary symmetric matrix is con-tained in the row space of there exists a solution .The solutions to the equation form a vector space ofdimension , and for all

Hence,

The map is a linear map from to , so the nu-merator also determines a linear map from to

(here we identify and ). If this linear map is the zeromap then

Page 15: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

372 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010

and if it is not zero then . Note that given , thereare ways to choose so that is the zeromap.

The Delsarte–Goethals sensing matrix is a matrix withrows and columns. These columns are the union

of mutually unbiased bases, where vectors in one orthogonalbasis look like noise to all other orthogonal bases.

APPENDIX BGENERALIZED MCDIARMID’S INEQUALITY

The method of “independent bounded differences” ([35])gives large-deviation concentration bounds for multivariatefunctions in terms of the maximum effect on the function valueof changing just one coordinate. This method has been widelyused in combinatorial applications, and in learning theory. Inthis appendix, we prove that a modification of McDiarmid’sinequality also holds for distinct (in contrast to independent)random variables; our proof consists again in forming martin-gale sequences.

We first introduce some notation. Let be proba-bility spaces and define as the probability space of all distinct

-tuples, that is

such that(41)

(This definition is spelled out in more detail at the endof Section III-B5) Let be a function from

to , and let be the correspondingrandom variable on . Denote by the -tuple ofrandom variables on the probability space .(The “complete” -tuple will also be de-noted by just .) Analogously, define to be the

-tuple of random variables . We shallalso use the notations ,

;and

are defined analogously.Theorem B.1 (Self-Avoiding McDiarmid Inequality):

Let be the probability space defined in (41), and letbe a function such that for any index , and any

, see (42) at the bottom of the page.Then for any positive

(43)

Our proof will invoke Hoeffding’s Lemma [35].Proposition B.2 (Hoeffding;s Lemma): Let be a random

variable with and then for

In our proof, we will also make use of the functions

where

As a result, for all in , see (44) at thebottom of the page, is less than . This implies, for all

(45)

(46)

(47)

or

(48)

Until now, we have viewed each as a function on the subsetof ; it is straightforward to lift the to functions

on all of . The can also be considered

(42)

(44)

Page 16: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

CALDERBANK et al.: CONSTRUCTION OF A LARGE CLASS OF DETERMINISTIC SENSING MATRICES THAT SATISFY A STATISTICAL ISOMETRY PROPERTY 373

as random variables on , depending only on the first compo-nents of

(The subscript on the expectation indicates that oneaverages only with respect to the variables listed in the subscript,in this case the last variables. We adopt this subscriptconvention in what follows; only expectations without subscriptare with respect to the whole probability space .)

Viewing the as random variables, we observe that, and that . Because

of the restriction to , the random variables , are notindependent. However, with respect to averaging over , the

constitute a martingale in the following sense:

(49)

Proof: Using Markov’s inequality, we see that for any pos-itive

(50)

Since , we can rewrite this as

By marginalization of the expectation, see the equation at thetop of the page, where we have used that each depends ononly the first components of , so that only isaffected by the averaging over .

By (48), we have, for all ,, which can also be rewritten as.

Because of the martingale property (49) we have

.Combining these last two observations with Hoeffding’s

Lemma [35] we conclude with the second equation shown atthe top of the page. Substituting this into (50) we obtain

(51)

Since (51) is valid for any , we can optimize over . Bysubstituting the value we get

Replacing the function by , it follows that; union bounds

therefore imply that

ACKNOWLEDGMENT

The authors would like to thank L. Applebaum, R. Baraniuk,D. Cochran, I. Daubechies, A. Gilbert, S. Gurevich, R. Hadani,P. Indyk, R. Schapire, J. Tropp, R. Ward, and the anonymousreviewers for their insights and helpful suggestions.

REFERENCES

[1] R. Berinde, A. Gilbert, P. Indyk, H. Karloff, and M. Strauss, “Com-bining geometry and combinatorics: A unified approach to sparsesignal recovery,” in Proc. 46th Annu. Allerton Conf. Commun., Con-trol, Comput., Sep. 2008, pp. 798–805.

[2] E. Candès, J. Romberg, and T. Tao, “Stable signal recovery from in-complete and inaccurate measurements,” Commun. Pure Appl. Math.,vol. 59, no. 8, pp. 1207–1223, 2006.

Page 17: Construction of a Large Class of Deterministic Sensing Matrices That Satisfy a Statistical Isometry Property

374 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010

[3] A. Gilbert, M. Strauss, J. Tropp, and R. Vershynin, “One sketch forall: Fast algorithms for compressed sensing,” in Proc. 39th Annu. ACMSymp. Theory of Comput. (STOC), 2007, pp. 237–246.

[4] G. Cormode and S. Muthukrishnan, “Combinatorial algorithms forcompressed sensing,” in Proc. 40th Annu. Conf. Inf. Sci. Syst. (CISS),Princeton, NJ, 2006.

[5] S. Jafarpour, W. Xu, B. Hassibi, and R. Calderbank, “ Efficient androbust compressed sensing using optimized expander graphs,” IEEETrans. Inf. Theory, vol. 55, no. 9, pp. 4299–4308, Sep. 2009.

[6] P. Indyk and M. Ruzic, “Near-optimal sparse recovery in the ��norm,” in Proc. 49th Annu. IEEE Symp. Foundations of Comput. Sci.(FOCS’08), 2008, pp. 199–207.

[7] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery fromincomplete and inaccurate samples,” Appl. Comput. Harmonic Anal.,vol. 26, no. 3, pp. 301–321, May 2009.

[8] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensing:Closing the gap between performance and complexity,” IEEE Trans.Inf. Theory, vol. 55, no. 5, pp. 2230–2249, May 2009.

[9] D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52,no. 4, pp. 1289–1306, Apr. 2006.

[10] E. Candès and T. Tao, “Near optimal signal recovery from random pro-jections: Universal encoding strategies,” IEEE Trans. Inf. Theory, vol.52, no. 12, pp. 5406–5425, Dec. 2006.

[11] E. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles:Exact signal reconstruction from highly incomplete frequency infor-mation,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, Feb.2006.

[12] R. A. DeVore, “Deterministic constructions of compressed sensing ma-trices,” J. Complexity, vol. 23, no. 4–6, pp. 918–925, Aug.–Dec. 2007.

[13] P. Indyk, “Explicit constructions for compressed sensing of sparsesignals,” in Proc. 19th Annu. ACM-SIAM Symp. Discrete Algorithms(SODA), Jan. 2008, pp. 30–33.

[14] B. Kashin, “The widths of certain finite dimensional sets and classesof smooth functions,” Izvestia, vol. 41, pp. 334–351, 1977.

[15] E. D. Gluskin, “Norms of random matrices and widths of finite dimen-sional sets,” Math USSR Sbornik, vol. 48, pp. 173–182, 1984.

[16] W. Bajwa, J. Haupt, G. Raz, S. Wright, and R. Nowak, “Toeplitz-struc-tured compressed sensing matrices,” in Proc. IEEE/SP 14th WorkshopPublication Statist. Signal Process. , Aug. 2007, pp. 294–298.

[17] J. Tropp, M. Wakin, M. Duarte, D. Baron, and R. Baraniuk, “Randomfilters for compressive sampling and reconstruction,” in Proc. IEEEInt. Conf. Acoust., Speech, Signal Process., May 2006, vol. III, pp.872–875.

[18] V. Guruswami, J. Lee, and A. Razborov, “Almost euclidean subspacesof � via expander codes,” in Proc. 19th Annu. ACM-SIAM Symp. Dis-crete Algorithms (SODA), Jan. 2008, pp. 353–362.

[19] D. Baron, S. Sarvotham, and R. Baraniuk, “Bayesian compressedsensing via belief propagation,” Elect. Eng. Dept., Rice Univ., 2006,TREE 0601.

[20] M. Akçakaya, J. Park, and V. Tarokh, “Compressive sensing using lowdensity frames,” IEEE Trans. Signal Process., 2009, submitted for pub-lication.

[21] M. Akcakaya and V. Tarokh, “A frame construction and a universal dis-tortion bound for sparse representations,” IEEE Trans. Signal Process.,vol. 56, no. 6, pp. 2443–2450, Jun. 2008.

[22] W. Xu and B. Hassibi, “Efficient compressive sensing with determin-istic guarantees using expander graphs,” in Proc. IEEE Inf. TheoryWorkshop, Lake Tahoe, NV, Sep. 2007, vol. III, pp. 414–419.

[23] L. Applebaum, S. Howard, S. Searle, and R. Calderbank, “Chirpsensing codes: Deterministic compressed sensing measurements forfast recovery,” Appl. Comput. Harmonic Anal., vol. 26, no. 2, pp.283–290, Mar. 2009.

[24] S. Howard, R. Calderbank, and S. Searle, “A fast reconstruction algo-rithm for deterministic compressive sensing using second order Reed-Muller codes,” in Proc. Conf. Inf. Sci. Syst. (CISS), Princeton, NJ, Mar.2008, pp. 11–15.

[25] R. Calderbank, S. Howard, and S. Jafarpour, “Sparse Reconstructionvia The Reed–Muller Sieve,” in Proc. IEEE Int. Symp. Inf. Theory(ISIT), 2010, submitted for publication.

[26] M. R. de Prony, “Essai expérimentalle et analytique,” J. École Polytech.Paris, vol. 1, pp. 24–76, 1795.

[27] J. Wolf, “Decoding of Bose–Chaudhuri–Hocquenghem Codes andProny’s method for curve fitting,” IEEE Trans. Inf. Theory, vol. IT-3,no. 4, p. 608, Oct. 1967.

[28] G. Sh and R. Hadani, “Incoherent Dictionaries and the Statistical Re-stricted Isometry Property,” IEEE J. Sel. Topics Signal Process., 2008,submitted for publication.

[29] R. Hom and C. Johnson, Matrix Analysis. Cambridge, U.K.: Cam-bridge Univ. Press, 1985.

[30] A. R. Hammons, P. V. Kumar, A. R. Calderbank, N. J. A. Sloane, andP. Sole, “The -linearity of Kerdock codes, Preparata, Goethals, andrelated codes,” IEEE Trans. Inf. Theory, vol. 40, no. 2, pp. 301–319,Mar. 1994.

[31] A. M. Kerdock, “A class of low-rate nonlinear binary codes,” Inf. Con-trol, vol. 20, pp. 182–187, 1972.

[32] P. Delsarte and J. M. Goethals, “Alternating bilinear forms over�����,” J. Combinatorial Theory, vol. 19, pp. 26–50, 1975.

[33] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-CorrectingCodes. Amsterdam, The Netherlands: North-Holland, 1977.

[34] N. Ailon and E. Liberty, “Fast dimension reduction using rademacherseries on dual BCH codes,” in Proc. 19th Annu. ACM-SIAM Symp. Dis-crete Algorithms (SODA), Jan. 2008, pp. 215–224.

[35] C. McDiarmid, “On the method of bounded differences (norwich,1989),” in Surveys in Combinatorics. Cambridge, U.K.: CambridgeUniv. Press, 1989, pp. 148–188.

[36] J. Tropp, “On the conditioning of random subdictionaries,” Appl. Com-putational Harmonic Anal., vol. 25, p. 124, 2008.

Robert Calderbank (M89–SM97–F98) received theB.Sc. degree from Warwick University, Coventry,U.K., in 1975, the M.Sc. degree from OxfordUniversity, Oxford, U.K., in 1976, and the Ph.D.degree from the California Institute of Technology,Paseadena, in 1980, all in mathematics.

He is a Professor of Electrical Engineering andMathematics at Princeton University, Princeton,NJ, where he directs the Program in Applied andComputational Mathematics. He joined Bell Tele-phone Laboratories as a Member of Technical Staff

in 1980, and retired from AT&T in 2003 as Vice President of Research. He hasmade significant contributions to a wide range of research areas, from algebraiccoding theory and quantum computing to wireless communication and activesensing.

Dr. Calderbank served as Editor in Chief of the IEEE TRANSACTIONS ON

INFORMATION THEORY from 1995 to 1998, and as Associate Editor for CodingTechniques from 1986 to 1989. He was a member of the Board of Governors ofthe IEEE Information Theory Society from 1991 to 1996 and began a secondterm in 2006. He was honored by the IEEE Information Theory Prize PaperAward in 1995 for his work on the Z4 linearity of Kerdock and Preparata Codes(joint with A. R. Hammons, Jr., P. V. Kumar, N. J. A. Sloane, and P. Sole), andagain in 1999 for the invention of space–time codes (joint with V. Tarokh and N.Seshadri). He received the 2006 IEEE Donald G. Fink Prize Paper Award andthe IEEE Millennium Medal, and was elected to the U.S. National Academy ofEngineering in 2005.

Stephen Howard (M’09) received the B.S. degreefrom La Trobe University, Melbourne, Australia, in1980 and the M.S. and Ph.D. degrees in mathematicsfrom La Trobe University in 1984 and 1990, respec-tively.

He joined the Australian Defence Science andTechnology (DSTO) in 1991, where he has been in-volved in research in the area electronic surveillanceand radar systems. He has led the DSTO researcheffort into the development of algorithms in allareas of surveillance, including radar pulse-train

deinterleaving, precision radar parameter estimation and tracking, estimationof radar intrapulse modulation, and advanced geolocation techniques. Since2003, he has led the DSTO long-range research program in radar resourcemanagement and waveform design.

Sina Jafarpour (A’09) received the B.Sc. degree incomputer engineering (with great distinction) fromSharif University of Technology, Tehran, Iran, in July2007, and the M.A. degree in computer science fromPrinceton University, Princeton, NJ, in January 2009.He is currently pursuing the Ph.D. degree in the De-partment of Computer Science, Princeton University,as a member of the Machine Learning and Theoret-ical Computer Science groups in the Department ofComputer Science, and the Van Gogh Project teamin the Program in Applied and Computational Math-

ematics.His research interests include compressive sampling, machine learning,

coding theory, and signal processing.Mr. Jafarpour received the Princeton University Fellowship award in July

2007. In October 2008, his research on compressive learning and detection re-ceived one of the five best student paper awards from Google and the New YorkAcademy of Sciences.