23
INCOHERENT DICTIONARIES AND THE STATISTICAL RESTRICTED ISOMETRY PROPERTY SHAMGAR GUREVICH AND RONNY HADANI Abstract. In this paper we formulate and prove a statistical version of the restricted isometry property (SRIP for short) which holds in general for any incoherent dictionary D which is a disjoint union of orthonormal bases. In addition, we prove that, under appropriate normalization, the spectral distrib- ution of the associated Gram operator converges in probability to the Sato-Tate (also called semi-circle) distribution. The result is then applied to various dic- tionaries that arise naturally in the setting of nite harmonic analysis, giving, in particular, a better understanding on a conjecture of Calderbank concerning RIP for the Heisenberg dictionary of chirp like functions. 0. Introduction Digital signals, or simply signals, can be thought of as complex valued functions on the nite eld F p where p is a prime number. The space of signals H = C (F p ) is a Hilbert space of dimension p, with the inner product given by the standard formula hf;gi = P t2Fp f (t) g (t): A dictionary D is simply a set of vectors (also called atoms ) in H. The number of vectors in D can exceed the dimension of the Hilbert space H, in fact, the most interesting situation is when jDj p = dim H. In this set-up one can dene a resolution of the Hilbert space H via D, which is the morhpism of vector spaces : C (D) !H, dened by (f )= P 2D f () , for every f 2 C (D). A more concrete way to think of the morphism is as a p jDj matrix with the columns being the atoms in D. 0.1. The restricted isometry property. Fix a natural number n 2 N and a pair of positive real numbers 1 ; 2 2 R >0 . Denition 0.1. A dictionary D satises the restricted isometry property (RIP for short) with coe¢ cients ( 1 ; 2 ;n) if for every subset S D such that jSj n (1 2 ) kf kk(f )k (1 + 1 ) kf k ; for every function f 2 C (D) which is supported on the set S. Date : July 2008. c Copyright by S. Gurevich and R. Hadani, July 1, 2008. All rights reserved. 1

Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

INCOHERENT DICTIONARIES AND THE STATISTICALRESTRICTED ISOMETRY PROPERTY

SHAMGAR GUREVICH AND RONNY HADANI

Abstract. In this paper we formulate and prove a statistical version of therestricted isometry property (SRIP for short) which holds in general for anyincoherent dictionary D which is a disjoint union of orthonormal bases. Inaddition, we prove that, under appropriate normalization, the spectral distrib-ution of the associated Gram operator converges in probability to the Sato-Tate(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of �nite harmonic analysis, giving,in particular, a better understanding on a conjecture of Calderbank concerningRIP for the Heisenberg dictionary of chirp like functions.

0. Introduction

Digital signals, or simply signals, can be thought of as complex valued functionson the �nite �eld Fp where p is a prime number. The space of signals H = C (Fp)is a Hilbert space of dimension p, with the inner product given by the standardformula

hf; gi =Pt2Fp

f (t) g (t):

A dictionary D is simply a set of vectors (also called atoms) in H. The numberof vectors in D can exceed the dimension of the Hilbert space H, in fact, the mostinteresting situation is when jDj � p = dimH. In this set-up one can de�ne aresolution of the Hilbert space H via D, which is the morhpism of vector spaces

� : C (D)! H,

de�ned by �(f) =P

'2D f (')', for every f 2 C (D). A more concrete way tothink of the morphism � is as a p� jDj matrix with the columns being the atomsin D.

0.1. The restricted isometry property. Fix a natural number n 2 N and a pairof positive real numbers �1; �2 2 R>0.

De�nition 0.1. A dictionary D satis�es the restricted isometry property (RIP forshort) with coe¢ cients (�1; �2; n) if for every subset S � D such that jSj � n

(1� �2) kfk � k�(f)k � (1 + �1) kfk ;

for every function f 2 C (D) which is supported on the set S.

Date : July 2008.c Copyright by S. Gurevich and R. Hadani, July 1, 2008. All rights reserved.

1

Page 2: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

2 SHAMGAR GUREVICH AND RONNY HADANI

Equivalently, RIP can be formulated in terms of the spectral radius of the cor-responding Gram operator. Let G (S) denote the composition ��S � �S with �Sdenoting the restriction of � to the subspace CS (D) � C (D) of functions sup-ported on the set S. The dictionary D satis�es (�1; �2; n)-RIP if for every subsetS � D such that jSj � n

�2 � kG (S)� IdSk � �1;

where IdS is the identity operator on CS (D).

Problem 0.2. Find deterministic construction of a dictionary D with jDj � pwhich satis�es RIP with coe¢ cients in the critical regime

(0.1) �1; �2 � 1 and n = � � p;

for some constant 0 < � < 1. As far as we know, to this day, there is no exampleof such a construction.

0.2. Incoherent dictionaries. Fix a positive real number � 2 R>0.

De�nition 0.3. A dictionary D is called incoherent with coherence coe¢ cient �(also called �-coherent) if for every pair of distinct atoms '; � 2 D

jh'; �ij � �pp.

There are various explicit constructions of incoherent dictionaries. We will beinterested, particularly, with three examples which arise naturally in the setting of�nite harmonic analysis (for the sake of completeness we review the constructionof these examples in Section 3):

� The �rst example [H, HCM], referred to as the Heisenberg dictionary DH ,is constructed using the Heisenberg representation of the �nite Heisenberggroup H (Fp). The Heisenberg dictionary is of size jDH j � p2 and itscoherence coe¢ cient is � = 1.

� The second example [GHS], which is referred to as the oscillator dictionaryDO, is constructed using the Weil representation of the �nite symplecticgroup SL2 (Fp). The oscillator dictionary is of size jDOj � p3 and itscoherence coe¢ cient is � = 4.

� The third example [GHS], referred to as the extended oscillator dictionaryDEO, is constructed using the Heisenberg-Weil representation of the �niteJacobi group J (Fp) = SL2 (Fp) nH (Fp). The extended oscillator dictio-nary is of size jDEOj � p5 and its coherence coe¢ cient is � = 4.

The three examples of dictionaries we just described constitute reasonable can-didates for solving Problem 0.2: They are large in the sense that jDj � p and,moreover, empirical evidence suggests that they satisfy RIP with coe¢ cients in thecritical regime (0.1). We summarize this as follows:

Conjecture 0.4. The dictionaries DH ;DO and DEO satisfy RIP with coe¢ cients�1; �2 � 1 and n = � � p, for some 0 < � < 1.

0.3. Main results. In this paper we formulate a relaxed statistical version of RIP,called statistical isometry property (SRIP for short) and we prove that it holds forany incoherent dictionary D which is, in addition, a disjoint union of orthonormal

Page 3: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

STATISTICAL RESTRICTED ISOMETRY PROPERTY 3

bases:

(0.2) D =ax2X

Bx,

where Bx =�b1x; ::; b

px

is an orthonormal basis of H, for every x 2 X.

0.3.1. The statistical isometry property. Let D be an incoherent dictionary of thefrom (0.2). Roughly, the statement is that for S � D, jSj = n with n = p1�", for0 < " < 1, choosen uniformly at random, the operator norm kG (S)� IdSk is smallwith high probability.

Theorem 0.5 (SRIP property). For every k 2 N, there exists a constant C (k)such that

(0.3) P�kG (S)� IdSk � p�"=2

�� C (k) p1�"k=2:

The above theorem, in particular, implies that P�kG (S)� IdSk � p�"=2

�! 0

as p!1 faster then p�l for any l 2 N.

0.3.2. The statistics of the eigenvalues. A natural thing to know is how the eigen-values of the Gram operator G (S) �actuate around 1. In this regard, we study thespectral statistics of the normalized error term

E (S)= (p=n)1=2(G (S)� IdS) :

Let �E(S) = n�1Pn

i=1 ��i denote the spectral distribution of E (S) with �i,i = 1; ::; n, are the real eigenvalues of the Hermitian operator E (S). We prove that�E converges in probability as p!1 to the Sato-Tate distribution (also called thesemi-circle distribution) �ST (x) = (2�)

�1p4� x2 � 1[2;�2] (x) where 1[2;�2] is the

characteristic function of the interval [�2; 2].

Theorem 0.6 (Sato-Tate ).

(0.4) limp!1

�EP= �ST .

Remark 0.7. A limit of the form (0.4) is familiar in random matrix theory as theasymptotic of the spectral distribution of Wigner matrices. Interestingly, the sameasymptotic distribution appears in our situation, albeit, the probability spaces are ofdi¤erent nature (our probability spaces are, in particular, much smaller).

In particular, Theorems 0.5, 0.6 can be applied to the three examples DH , DOand DEO, which are all of the appropriate form (0.2). Finally, we remark that ourresult gives new information on a conjecture of Calderbank concerning RIP of theHeisenberg dictionary.

Remark 0.8. For practical applications, it might be important to compute ex-plicitly the constants C (k) which appears in (0.3). This constant depends on theincoherence coe¢ cient �, therefore, for a �xed p, having � as small as possible ispreferable.

Page 4: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

4 SHAMGAR GUREVICH AND RONNY HADANI

0.3.3. Structure of the paper. The paper consists of four sections except of theintroduction .In Section 1, we develop the statistical theory of systems of incoherent orthonor-

mal bases. We begin by specifying the basic set-up. Then we proceed to formulateand prove the main Theorems of this paper - Theorem 1.2, Theorem 1.3 and The-orem 1.4. The main technical statement underlying the proofs is formulated inTheorem 1.5. In Section 2, we prove Theorem 1.5. In Section 3, we review theconstructions of the dictionaries DH ;DO and DEO. Finally, in Appendix A, weprove all technical statements which appear in the body of the paper.

Acknowledgement 0.9. It is a pleasure to thank J. Bernstein for his interests andguidance in the mathematical aspects of this work. We are grateful to R.Calderbankfor explaining to us his work. We would like to thank A. Sahai for many interestingdiscussions. Finally, we thank B. Sturmfels for encouraging us to proceed in thisline of research.

1. The statistical theory of incoherent bases

1.1. Standard Terminology.

1.1.1. Terminology from asymptotic analysis. Let fapg ; fbpg be a pair of sequencesof positive real numbers. We write ap = O (bp) if there exists C > 0 and Po 2 Nsuch that ap � C � bp for every p � P0. We write ap = o (bp) if limp!1 ap=bp = 0.Finally, we write ap � bp if limp!1 ap=bp = 1.

1.1.2. Terminology from set theory. Let n 2 N�1. We denote by [1; n] the setf1; 2; ::; ng. Given a �nite set A, we denote by jAj the number of elements in A.

1.2. Basic set-up.

1.2.1. Incoherent orthonormal bases. Let f(Hp; h�;�ip)g be a sequence of Hilbertspaces such that dimHp = p.

De�nition 1.1. Two (sequences of) orthonormal bases Bp; B0p of Hp are called�-coherent if

jhb; b0ij � �pp;

for every b 2 Bp and b0 2 B0p and � is some �xed (does not depend on p) positivereal number.

Fix � 2 R>0. Let fXpg be a sequence of sets such that limp!1 jXpj =1 (usuallywe will have that p = o (jXpj)) such that each Xp parametrizes orthonormal basesof Hp which are �-coherent pairwisely., that is, for every x 2 Xp, there is anorthonormal basis Bx =

�b1x; ::; b

px

of Hp so that

(1.1)��hbix; bjyi�� � �

pp;

for every x 6= y 2 Xp. DenoteDp =

Fx2Xp

Bx.

The set Dp will be referred to as incoherent dictionary or sometime more pre-cisely as �-coherent dictionary.

Page 5: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

STATISTICAL RESTRICTED ISOMETRY PROPERTY 5

1.2.2. Resolutions of Hilbert spaces. Let �p : C (Dp) ! Hp be the morphism ofvector spaces given by

�p (f) =Pb2Dp

f (b) b.

The map �p will be referred to as resolution of Hp via Dp.Convention: For the sake of clarity we will usually omit the subscript p from

the notations.

1.3. Statistical restricted isometry property (SRIP). The main statementof this paper concerns a formulation of a statistical restricted isometry property(SRIP for short) of the resolution maps �.Let n = n (p) = p1�", for some 0 < " < 1. Let n = ([1; n]) denote the set of

injective maps

n = fS : [1; n] ,! Dg :

We consider the set n as a probability space equipped with the uniform prob-ability measure.Given a map S 2 n, it induces a morphism of vector spaces S : C ([1; n]) !

C (D) given by S (�i) = �S(i). Let us denote by �S : C ([1; n])! H the composition� � S and by G (S) 2 Matn�n (C) the Hermitian matrix

G (S) = ��S ��S .

Concretely,G (S) is the matrix (gij) where gij = hS (i) ; S (j)i. In plain language,G (S) is the Gram matrix associated with the ordered set of vectors (S (1) ; :::; S (n))in H.We consider G : n ! Matn�n (C) as a matrix valued random variable on the

probability space n. The following theorem asserts that with high probability thematrix G is close to the unit matrix In 2 Matn�n (C).

Theorem 1.2. Let 0 � e� 1 and let k 2 N be an even number such that ek � 1

P�kG� Ink � (n=p)1=(2+e)

�= O

�(n=p)

ek=(2+e)n�:

For a proof, see Subsection 1.6.In the above theorem, substituting n = p1�" yields

P�kG� Ink � p��=(2+e)

�= O

�p��e(k+1)=(2+e)+1

�:

Equivalently, Theorem 1.2 can be formulated as a statistical restricted isometryproperty of the resolution morphism �.A given S 2 n de�nes a morphism of vector spaces �S = ��S : C ([1; n])! H

- in this respect, � can be considered as a random variable

� : n ! Mor (C ([1; n]) ;H) .

Theorem 1.3 (SRIP property). Let 0 � e� 1 and let k 2 N be an even numbersuch that ek � 1

P�Sup fjk�(f)k � kfkjg � (n=p)1=(2+e)

�= O

�(n=p)

ek=(2+e)n�:

Page 6: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

6 SHAMGAR GUREVICH AND RONNY HADANI

1.4. Statistics of the error term. Let E denote the normalized error term

E =(p=n)1=2(G� In) .

Our goal is describe the statistics of the random variable E. Let �E denote thespectral distribution of E, namely

�E =1

n

nPi=1

��i(E);

where �1 (E) � �2 (E) � ::: � �n (E) are the eigen values of E indexed in de-creasing order (We note that the eigenvalues of E are real since it is an Hermitianmatrix). The following theorem asserts that the spectral distribution �E convergesin probability to the Sato-Tate distribution

(1.2) �ST (x) =1

2�

p4� x2 � 1[2;�2] (x) .

Theorem 1.4.limp!1

�EP= �ST .

For a proof, see Subsection 1.7.

1.5. The method of moments. The proofs of Theorems 1.2, 1.4 will be basedon the method of moments.Let mk denote the kth moment of the distribution �E, that is

mk =

ZR

xk�E (x) =1

n

nXi=1

�i (E)k=1

nTr�Ek�.

Similarly, let mST;k denote the kth moment of the semicircle distribution.

Theorem 1.5. For every k 2 N,

(1.3) limp!1

E (mk) = mST;k.

In addition,

(1.4) V ar (mk) = O�n�1

�;

For a proof, see Section 2.

1.6. Proof of Theorem 1.2. Theorem 1.2 follows from Theorem 1.5 using theMarkov inequality.Let � > 0 and k 2 N an even number. First, observe that the condition

kG� Ink � � is equivalent to the condition kEk � (p=n)1=2

� which, in turns,is equivalent to the spectral condition �max (E) � (p=n)1=2 �.Since, �max (E)

k � �1 (E)k+ �2 (E)

k+ ::+ �n (E)

k we can write

P��max (E) � (p=n)1=2 �

�= P

��max (E)

k � (p=n)k=2 �k�

� P�Pn

i=1 �i (E)k � (p=n)k=2 �k

�= P

�mk � n�1 (p=n)

k=2�k�:

Page 7: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

STATISTICAL RESTRICTED ISOMETRY PROPERTY 7

By the triangle inequalitymk � jmk � Emkj+Emk (recall that k is even, hencemk � 0) therefore we can write

P�mk � n�1 (p=n)

k=2�k�� P

�jmk � Emkj � n�1 (p=n)

k=2�k � Emk

�.

By (1.3), Emk = O (1), in addition, substituting � = (n=p)1=(e+2) = (p=n)�1=(e+2)

with 0 < e < 1, we get n�1 (p=n)k �k = n�1 (p=n)ek=2(2+e). Altogether, we can

summarize the previous development with the following inequality

P�kG� Ink � (n=p)1=(e+2)

�� P

�jmk � Emkj � n�1 (p=n)

ek=2(2+e)+O (1)

�:

By Markov inequality P (jmk � Emkj � �) � V ar (mk) =�2. Substituting � =

n�1 (p=n)ek=2(2+e)

+O (1) we get

P�jmk � Emkj � n�1 (p=n)

ek=2(2+e)+O (1)

�= O

�n (n=p)

ek=(2+e)�,

where in the last equality we used the estimate V ar (mk) = O�n�1

�(see Theorem

1.5).This concludes the proof of the theorem.

1.7. Proof of Theorem 1.4. Theorem 1.4 follows from Theorem 1.5 using theMarkov inequality.

In order to show that limp!1 �EP= �ST , it is enough to show that for every

k 2 N and � > 0 we have

limp!1

P (jmk �mST;kj � �) = 0:

The proof of the last assertion proceeds as follows: By the triangle inequality wehave that jmk �mST;kj � jmk � Emkj+ jEmk �mST;kj, therefore

P (jmk �mST;kj � �) � P (jmk � Emkj+ jEmk �mST;kj � �) .

By (1.3) there exists P0 2 N such that jEmk �mST;kj � �=2, for every p � P0,hence

P (jmk � Emkj+ jEmk �mST;kj � �) � P (jmk � Emkj � �=2) ;

for every p � P0. Now, using the Markov inequality

P (jmk � Emkj � �=2) � V ar (mk)

�=2:

This implies that

P (jmk �msck j � �) � V ar (mk)

�=2

p!1! 0,

where we use the estimate V ar (mk) = O (1=n) (Equation (1.4)).This concludes the proof of the theorem.

2. Proof of Theorem 1.5

2.1. Preliminaries on matrix multiplication.

Page 8: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

8 SHAMGAR GUREVICH AND RONNY HADANI

2.1.1. Pathes.

De�nition 2.1. A path of length k on a set A is a function : [0; k] ! A. Thepath is called closed if (0) = (k). The path is called strict if (j) 6= (j + 1)for every j = 0; ::; k � 1.Given a path : [0; k]! A, an element (j) 2 A is called a vertex of the path

. A pair of consecutive vertices ( (j) ; (j + 1)) is called an edge of the path .Let Pk (A) denote the set of strict closed paths of length k on the set A and by

Pk (A; a; b) where a; b 2 A, the set of strict paths of length k on A which begin atthe vertex a and end at the vertex b.Conventions:� We will consider only strict paths and refer to these simply as paths.� When considering a closed path 2 Pk (A), it will be sometime convenientto think of it as a function : Z=kZ! A.

2.1.2. Graphs associated with paths. Given a path , we can associate to it anundirected graph G = (V ; E ) where the set of vertices V = Im and the set ofedges E consists of all sets fa; bg � A so that either (a; b) or (b; a) is an edge of .

Remark 2.2. Since the graph G is obtained from a path it is connected and,moreover, jV j ; jE j � k where k is the length of :

De�nition 2.3. A closed path 2 Pk (A) is called a tree if the associated graphG is a tree and every edge fa; bg 2 E is crossed exactly twice by , once as (a; b)and once as (b; a).

Let Tk (A) � Pk (A) denote the set of trees of length k.Remark 2.4. If is a tree of length k then k must be even, moreover, k =2 (jV j � 1) :2.1.3. Isomorphism classes of paths. Let us denote by � (A) the permutation groupAut (A). The group � (A) acts on all sets which can be derived functorially fromthe set A, in particular it acts on the set of closed paths Pk (A) as follows: Given� 2 � (A) it sends a path : [0; k]! A to � � .An isomorphism class � = [ ] 2 Pk (A) =� (A) can be uniquely speci�ed by a

k + 1 ordered tuple of positive integers (�0; ::; �k) where for each j the vertex (j)is the � jth distinct vertex crossed by . For example, the isomorphism class of thepath = (a; b; c; a; b; a) is speci�ed by [ ] = (1; 2; 3; 1; 2; 1).As a consequence we get that

(2.1) j[ ]j = jAj(jV j) = jAj (jAj � 1) ::: (jAj � jV j+ 1) :

2.1.4. The combinatorics of matrix multiplication. First let us �x some generalnotations: If the set A is [1; n] then we will denote

� Pk = Pk ([1; n]), Pk (i; j) = Pk ([1; n] ; i; j).� Tk = Tk ([1; n]).� �n = �([1; n]) :

LetM 2 Matn�n (C) be a matrix such thatmii = 0, for every i 2 [1; n]. The (i; j)entry mk

i;j of the kth power matrix Mk can be described as a sum of contributions

indexed by strict paths, that is

mki;j =

X 2Pk(i;j)

w ,

Page 9: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

STATISTICAL RESTRICTED ISOMETRY PROPERTY 9

where w = m (0); (1) �m (1); (2) � :: �m (k�1); (k). Consequently, we can describethe trace of Mk as

(2.2) Tr�Mk

�=

Xi2[1;n]

X 2Pk(i;i)

w =X 2Pk

w ,

2.2. Fundamental estimates. Our goal here is to formulate the fundamentalestimates that we will require for the proof of theorem 1.5:Recall

mk = n�1Tr�Ek�= n�1 (p=n)

k=2Tr�(G�In)k

�.

Since (G�In)ii = 0 for every i 2 [1; n] we can write, using Equation (2.2), themoment mk in the form

(2.3) mk = n�1 (p=n)k=2

X 2Pk

w ,

where w : n ! C is the random variable given by

w (S) = hS � (0) ; S � (1)i � ::: � hS � (k � 1) ; S � (k)i .Consequently, we get that

(2.4) Emk = n�1 (p=n)k=2

X 2Pk

Ew :

Lemma 2.5. Let � 2 �n then Ew = Ew�( ).

For a proof, see Appendix A.Lemma 2.5 implies that the expectation Ew depends only on the isomorphism

class [ ] therefore we can write the sum (2.4) in the form

Emk =X

�2Pk=�n

n�1 (p=n)k=2 j� jEw� ;

where Ew� denotes the expectation Ew for any 2 � . Let us denote

n (�) = n�1 (p=n)k=2 j� j = n�1 (p=n)

k=2njV� j = pk=2njV� j�1�k=2,

where in the second equality we used (2.1). We conclude the previous developmentwith the following formula

(2.5) Emk =X

�2Pk=�n

n (�)Ew� .

Theorem 2.6 (Fundamental estimates). Let � 2 Pk=�n.(1) If k > 2 (jV� j � 1) then

(2.6) limp!1

n (�)Ew� = 0:

(2) If k � 2 (jV� j � 1) and � is not a tree then(2.7) lim

p!1n (�)Ew� = 0:

(3) If k � 2 (jV� j � 1) and � is a tree then(2.8) lim

p!1n (�)Ew� = 1:

For a proof, see Subsection 2.4.

Page 10: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

10 SHAMGAR GUREVICH AND RONNY HADANI

2.3. Proof of Theorem 1.5. The proof is a direct consequence of the fundamentalestimates (Theorem 2.6).

2.3.1. Proof of Equation (1.3). Our goal is to show that limp!1Emk = mST;k.Using Equation (2.5) we can write

(2.9) limp!1

Emk =X

�2Pk=�n

limp!1

n (�)Ew� :

When k is odd, no class � 2 Pk=�n is a tree (see Remark 2.4), therefore byTheorem 2.6 all the terms in the right side of (2.9) are equal to zero, which impliesthat in this case limp!1Emk = 0. When k is even then, again by Theorem 2.6,only terms associated to trees yields a non-zero contribution to the right side of(2.9), therefore in this case

limp!1

Emk =X

�2Tk=�n

limp!1

n (�)Ew� =X

�2Tk=�n

1 = jTkj .

For every m 2 N, let �m denote the mth Catalan number, that is

�m =

�2mm

�1

m+ 1:

On the one hand, the number of isomorphism classes of trees in Tk=�n can bedescribed in terms of the Catalan numbers:

Lemma 2.7. If k = 2m, m 2 N thenjT2mj = �m.

For a proof, see Appendix A.On the other hand, the moments mST;k of the Sato-Tate distribution are well-

known and can be described in terms of the Catalan numbers as well:

Lemma 2.8. If k = 2m then mST;k = �m otherwise, if k is odd then mST;k = 0.

Consequently we obtain that for every k 2 Nlimp!1

Emk = mST;k.

This concludes the proof of the �rst part of the theorem.

2.3.2. Proof of Equation (1.4). By de�nition, V ar (mk) = Em2k � (Emk)

2.Equation (2.3) implies that

Em2k = n�2 (p=n)

kX

1; 22Pk

E�w 1w 2

�,

Equation (2.4) implies that

(Emk)2= n�2 (p=n)

kX

1; 22Pk

Ew 1Ew 2

When V \ V 0 = ?, E(w 1w 2) = Ew 1Ew 2 . If we denote by Ik � Pk � Pkthe set of pairs ( 1; 2) such that V 1 \ V 2 6= ? then we can write

V ar (mk) = n�2 (p=n)k

X( 1; 2)2Ik

�E(w 1w 2)� Ew 1Ew 2

�:

The estimate of the variance now follows from

Page 11: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

STATISTICAL RESTRICTED ISOMETRY PROPERTY 11

Lemma 2.9.

n�2 (p=n)k

X( 1; 2)2Ik

��E(w 1w 2)�� = O�n�1

�;

n�2 (p=n)k

X( 1; 2)2Ik

��Ew 1�� ��Ew 2�� = O�n�1

�:

For a proof, see Appendix A.This concludes the proof of the second part of the theorem.

2.4. Proof of Theorem 2.6. We begin by introducing notation: Given a set Awe denote by (A) the set of injective maps

(A) = fS : A ,! Dg ,

and consider (A) as a probability space equipped with the uniform probabilitymeasure.

2.4.1. Proof of Equation 2.6. Let � = [ ] 2 Pk=�n be an isomorphism class andassume that k > 2 (jV j � 1). Our goal is to show that

limp!1

n (�) jEw� j = 0.

On the one hand, by Equation (2.1), we have that j[ ]j � njV j, therefore

(2.10) n (�) = (p=n)k=2 j[ ]j � pk=2njV j�1�k=2.

On the other hand

Ew� = jnj�1XS2n

w (S) = j (V )j�1X

S2(V )

w (S) .

By the triangle inequality, jEw j � j (V )j�1P

S2(V ) jw (S)j, moreover, bythe incoherence condition (Equation (1.1))

jw (S)j � �k=2p�k=2,

for every S 2 (V ). In conclusion, we get that jEw j � �k=2p�k=2 which combinedwith (2.10) yields

n (�) jEw� j = O�njV j�1�k=2

�p!1�! 0,

since, by assumption, jV j � 1� k=2 < 0.This concludes the proof of Equation 2.6.

2.4.2. Proof of Equations (2.7) and (2.8). Let � = [ ] 2 Pk=�n be an isomorphismclass and assume that k � 2 (jV j � 1). We prove Equations (2.7), (2.8) by inductionon jV j.Since k � 2 (jV j � 1), there exists a vertex v = (i0) where 0 � i0 � k � 1,

which is crossed once by the path . Let vl = (i0 � 1) and vr = (i0 + 1) be theadjacent vertices to v.We will deal with the following two cases separately:

� Case 1. vl 6= vr.� Case 2. vl = vr.

Page 12: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

12 SHAMGAR GUREVICH AND RONNY HADANI

Introduce the following auxiliary constructions:If vl 6= vr, let bv : [0; k � 1] ! [1; n] denote the closed path of length k � 1

de�ned by

bv (j) =� (j) j � i0 � 1 (j + 1) i0 � j � k � 1 :

In words, the path bv is obtained from by deleting the vertex v and insertingan edge connecting vl to vr.If vl = vr, let bv : [0; k � 2] ! [1; n] denote the closed path of length k � 2

de�ned by

bv (j) =� (j) j � i0 � 1 (j + 2) i0 � j � k � 2 :

In words, the path bv is obtained from by deleting the vertex v and identifyingthe vertices vl and vr.In addition, for every u 2 V �fv; vl; vrg, let u : [0; k]! [1; n] denote the closed

path of length k de�ned by

u (j) =

8<: (j) j � i0 � 1u j = i0 (j) i0 + 1 � j � k

:

In words, the path u is obtained from by deleting the vertex v and insertingan edge connecting vl to u followed by an edge connecting u to vr.Important fact: The number of vertices in the paths bv, u is jV j � 1.The main technical statement is the following relation between the expectation

Ew and the expectations Ew bv , Ew u .Proposition 2.10.

(2.11) Ew � p�1Ew bv � (p jXj)�1Xu

Ew u .

For a proof, see Appendix A.Analysis of case 1.In this case the path is not a tree hence our goal is to show that

limp!1

n (�)Ew� = 0.

The length of bv is k � 1 and ��V bv �� = jV j � 1, thereforen (�) � pk=2njV j�1�k=2 � p1=2n1=2n ([ bv]) :

The length of u is k and��V u �� = jV j � 1, therefore

n (�) � pk=2njV j�1�k=2 � n � n ([ u]) :Applying the above to (2.11) we obtain

n (�)Ew� � (n=p)1=2 n ([ bv])Ew bv .By estimate (2.6) and the induction hypothesis n ([ bv])Ew bv = O (1), therefore

limp!1 n (�)Ew� = 0, since (n=p) = o (1) (recall that we take n = p1��).This concludes the proof of Equation 2.7:Analysis of case 2.The length of bv is k � 2 and ��V bv �� = jV j � 1, therefore

n (�) � pk=2njV j�1�k=2 � pn ([ bv]) :

Page 13: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

STATISTICAL RESTRICTED ISOMETRY PROPERTY 13

The length of u is k and��V u �� = jV j � 1, therefore

n (�) � pk=2njV j�1�k=2 � n � n ([ u]) :Applying the above to (2.11) yields

n (�)Ew� � n ([ bv])Ew bv .If is a tree then bv is also a tree with a smaller number of vertices, therefore,

by the induction hypothesis limp!1 n ([ bv])Ew bv = 1 which implies by (2.11) thatlimp!1 n (�)Ew� = 1 as well.This concludes the proof of Equation (2.8):

3. Examples of incoherent dictionaries

3.1. Representation theory.

3.1.1. The Heisenberg group. Let (V; !) be a two-dimensional symplectic vectorspace over the �nite �eld Fp. The reader should think of V as Fp � Fp with thestandard symplectic form

! ((� ; w) ; (� 0; w0)) = �w0 � w� 0:Considering V as an Abelian group, it admits a non-trivial central extension

called the Heisenberg group. Concretely, the group H can be presented as the setH = V � Fp with the multiplication given by

(v; z) � (v0; z0) = (v + v0; z + z0 + 12!(v; v

0)):

The center of H is Z = Z(H) = f(0; z) : z 2 Fpg : The symplectic group Sp =Sp(V; !), which in this case is just isomorphic to SL2 (Fp), acts by automorphismof H through its tautological action on the V -coordinate, that is, a matrix

g =

�a bc d

�;

sends an element (v; z), where v = (� ; w) to the element (gv; z) where gv =(a� + bw; c� + dw).

3.1.2. The Heisenberg representation . One of the most important attributes of thegroup H is that it admits, principally, a unique irreducible representation. Theprecise statement goes as follows: Let : Z ! S1 be a non-degenerate unitarycharacter of the center, for example, in this paper we take (z) = e

2�ip z. It is not

di¢ cult to show [T] that

Theorem 3.1 (Stone-von Neuman). There exists a unique (up to isomorphism)irreducible unitary representation � : H ! U (H) with central character , that is,� (z) = (z) � IdH, for every z 2 Z.

The representation � which appears in the above theorem will be called theHeisenberg representation.The representation � : H ! U (H) can be realized as follows: H is the Hilbert

space C(Fp) of complex valued functions on the �nite line, with the standard innerproduct

hf; gi =Xt2Fp

f (t) g (t),

for every f; g 2 C(Fp), and the action � is given by

Page 14: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

14 SHAMGAR GUREVICH AND RONNY HADANI

� �(� ; 0)[f ] (t) = f (t+ �) ;� �(0; w)[f ] (x) = (wt) f (t) ;� �(z)[f ] (t) = (z) f (t) ; z 2 Z:

Here we are using � to indicate the �rst coordinate and w to indicate the secondcoordinate of V ' Fp � Fp.We will call this explicit realization the standard realization.

3.1.3. The Weil representation . A direct consequence of Theorem 3.1 is the exis-tence of a projective unitary representation e� : Sp! PU(H). The construction ofe� out of the Heisenberg representation � is due to Weil [W] and it goes as follows:Considering the Heisenberg representation � : H ! U (H) and an element g 2 Sp,one can de�ne a new representation �g : H ! U (H) by �g (h) = � (g (h)). Clearlyboth � and �g have the same central character hence by Theorem 3.1 they areisomorphic. Since the space of intertwining morphisms HomH(�; �g) is one dimen-sional, choosing for every g 2 Sp a non-zero representative e�(g) 2 HomH(�; �g)gives the required projective representation.In more concrete terms, the projective representation e� is characterized by the

Egorov�s condition:

(3.1) e� (g)� (h)e� �g�1� = � (g (h)) ;

for every g 2 Sp and h 2 H.The important and non-trivial statement is that the projective representation e�

can be linearized in a unique manner into an honest unitary representation:

Theorem 3.2. There exists a unique1 unitary representation

� : Sp �! U(H);such that every operator � (g) satis�es Equation (3.1).

For the sake of concreteness, let us give an explicit description (which can bedirectly veri�ed using Equation (3.1)) of the operators � (g), for di¤erent elementsg 2 Sp, as they appear in the standard realization. The operators will be speci�edup to a unitary scalar.

� The standard diagonal subgroup A � Sp acts by (normalized) scaling: Anelement

a =

�a 00 a�1

�;

acts by

(3.2) Sa [f ] (t) = � (a) f�a�1t

�;

where � : F�p ! f�1g is the unique non-trivial quadratic character ofthe multiplicative group F�p (also called the Legendre character), given by�(a) = a

p�12 (mod p).

� The subgroup of strictly upper diagonal elements U � Sp acts by quadraticexponents (chirps): An element

u =

�1 u0 1

�;

1Unique, except in the case the �nite �eld is F3.

Page 15: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

STATISTICAL RESTRICTED ISOMETRY PROPERTY 15

acts by

Mu [f ] (t) = (u2 t2)f (t) :

� The Weyl element

w =

�0 1�1 0

�acts by discrete Fourier transform

F [f ] (w) =1pp

Xt2Fp

(wt) f (t) :

3.2. The Heisenberg dictionary. The Heisenberg dictionary is a collection ofp + 1 orthonormal bases, each characterized, roughly, as eigenvectors of a speci�clinear operator. An elegant way to de�ne this dictionary is using the Heisenbergrepresentation [H, HCM].

3.2.1. Bases associated with lines. The Heisenberg group is non-commutative, yet itconsists of various commutative subgroups which can be easily described as follows:Let L � V be a line in V . One can associate to L a commutative subgroup AL � H,given by AL = f(l; 0) : l 2 Lg. It will be convenient to identify the group ALwith the line L. Restricting the Heisenberg representation � to the commutativesubgroup L, namely, considering the restricted representation � : L ! U (H), oneobtains a collection of operators f� (l) : l 2 Lg which commute pairwisely. This, inturns, yields an orthogonal decomposition into character spaces

H =L�H�;

where � runs in the set bL of unitary characters of L.A more concrete way to specify the above decomposition is by choosing a non-zero

vector l0 2 L. After such a choice, the character space H� naturally corresponds tothe eigenspace of the linear operator � (l0) associated with the eigenvalue � = � (l0).It is not di¢ cult to verify in this case that

Lemma 3.3. For every � 2 bL we have dimH� = 1.

Choosing a vector '� 2 H� of unit norm '� = 1, for every � 2 bL which

appears in the decomposition, we obtain an orthonormal basis which we denote byBL.

Theorem 3.4 ([H, HCM]). For every pair of di¤erent lines L;M � V and forevery ' 2 BL, � 2 BM

jh'; �ij = 1pp.

Since there exist p+1 di¤erent lines in V , we obtain in this manner a collectionof p+ 1 orthonormal bases

DH =aL�V

BL.

which are � = 1-coherent. We will call this dictionary, for obvious reasons, theHeisenberg dictionary.

Page 16: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

16 SHAMGAR GUREVICH AND RONNY HADANI

3.3. The oscillator dictionary. Re�ecting back on the Heisenberg dictionary wesee that it consists of a collection of orthonormal bases characterized in terms ofcommutative families of unitary operators where each such family is associated witha commutative subgroup in the Heisenberg group H, via the Heisenberg representa-tion � : H ! U (H). In comparison, the oscillator dictionary [GHS] is characterizedin terms of commutative families of unitary operators which are associated withcommutative subgroups in the symplectic group Sp via the Weil representation� : Sp! U (H).

3.3.1. Maximal tori. The commutative subgroups in Sp that we consider are calledmaximal algebraic tori [B] (not to be confused with the notion of a topologicaltorus). A maximal (algebraic) torus in Sp is a maximal commutative subgroupwhich becomes diagonalizable over some �eld extension. The most standard exam-ple of a maximal algebraic torus is the standard diagonal torus

A =

��a 00 a�1

�: a 2 F�p

�:

Standard linear algebra shows that up to conjugation2 there exist two classesof maximal (algebraic) tori in Sp. The �rst class consists of those tori which arediagonalizable already over Fp, namely, those are tori T which are conjugated tothe standard diagonal torus A or more precisely such that there exists an elementg 2 Sp so that g � T � g�1 = A. A torus in this class is called a split torus.The second class consists of those tori which become diagonalizable over the

quadratic extension Fp2 , namely, those are tori which are not conjugated to thestandard diagonal torus A. A torus in this class is called a non-split torus (some-times it is called inert torus).All split (non-split) tori are conjugated to one another, therefore the number of

split tori is the number of elements in the coset space Sp=N (see [A] for basics ofgroup theory), where N is the normalizer group of A; we have

#(Sp=N) =p (p+ 1)

2;

and the number of non-split tori is the number of elements in the coset space Sp=M ,where M is the normalizer group of some non-split torus; we have

#(Sp=M) = p (p� 1) :

Example of a non-split maximal torus. It might be suggestive to explain furtherthe notion of non-split torus by exploring, �rst, the analogue notion in the morefamiliar setting of the �eld R. Here, the standard example of a maximal non-splittorus is the circle group SO(2) � SL2(R). Indeed, it is a maximal commutativesubgroup which becomes diagonalizable when considered over the extension �eld Cof complex numbers. The above analogy suggests a way to construct examples ofmaximal non-split tori in the �nite �eld setting as well.Let us assume for simplicity that �1 does not admit a square root in Fp or

equivalently that p � 1mod 4. The group Sp acts naturally on the plane V =

2Two elements h1; h2 in a group G are called conjugated elements if there exists an elementg 2 G such that g �h1 �g�1 = h2. More generally, Two subgroups H1; H2 � G are called conjugatedsubgroups if there exists an element g 2 G such that g �H1 � g�1 = H2.

Page 17: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

STATISTICAL RESTRICTED ISOMETRY PROPERTY 17

Fp � Fp. Consider the standard symmetric form B on V given by

B((x; y); (x0; y0)) = xx0 + yy0:

An example of maximal non-split torus is the subgroup SO = SO (V;B) � Spconsisting of all elements g 2 Sp preserving the form B, namely g 2 SO if andonly if B(gu; gv) = B(u; v) for every u; v 2 V . In coordinates, SO consists of allmatrices A 2 SL2 (Fp) which satisfy AAt = I. The reader might think of SO asthe "�nite circle".

3.3.2. Bases associated with maximal tori. Restricting the Weil representation to amaximal torus T � Sp yields an orthogonal decomposition into character spaces

(3.3) H =L�H�;

where � runs in the set bT of unitary characters of the torus T .A more concrete way to specify the above decomposition is by choosing a gen-

erator3 t0 2 T , that is, an element such that every t 2 T can be written in theform t = tn0 , for some n 2 N. After such a choice, the character spaces H� whichappears in (3.3) naturally corresponds to the eigenspace of the linear operator � (t0)associated to the eigenvalue � = � (t0).The decomposition (3.3) depends on the type of T in the following manner:

� In the case where T is a split torus we have dimH� = 1 unless � = �,where � : T ! f�1g is the unique non-trivial quadratic character of T (alsocalled the Legendre character of T ), in the latter case dimH� = 2.

� In the case where T is a non-split torus then dimH� = 1 for every character� which appears in the decomposition, in this case the quadratic character� does not appear in the decomposition (for details see [GH]).

Choosing for every character � 2 bT ; � 6= �, a vector '� 2 H� of unit norm, weobtain an orthonormal system of vectors BT =

�'� : � 6= �

.

Important fact: In the case when T is a non-split torus, the set BT anorthonormal basis.

Example 3.5. It would be bene�cial to describe explicitly the system BA whenA ' Gm is the standard diagonal torus. The torus A acts on the Hilbert space Hby scaling (see Equation (3.2)).For every � 6= �, de�ne a function '� 2 C (Fp) as follows:

'�(t) =

� 1pp�1�(t) t 6= 00 t = 0

:

It is easy to verify that '� is a character vector with respect to the action � : A!U (H) associated to the character � � �. Concluding, the orthonormal system BAis the set f'� : � 2 bGm; � 6= �g.

Theorem 3.6 ([GHS]). Let � 2 BT1 and ' 2 BT2

jh�; 'ij � 4pp:

3A maximal torus T in SL2 (Fp) is a cyclic group, thus there exists a generator.

Page 18: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

18 SHAMGAR GUREVICH AND RONNY HADANI

Since there exist p (p� 1) distinct non-split tori in Sp, we obtain in this mannera collection of p (p� 1) orthonormal bases

DO =aT�Spnon-split

BT .

which are � = 4-coherent. We will call this dictionary the Oscillator dictionary.

3.4. The extended oscillator dictionary.

3.4.1. The Jacobi group. Let us denote by J the semi-direct product of groups

J = SpnH.The group J will be referred to as the Jacobi group.

3.4.2. The Heisenberg-Weil representation. The Heisenberg representation � : H !U (H) and the Weil representation � : Sp ! U (H) combine to a representation ofthe Jacobi group

� = �n � : J ! U (H) ,de�ned by � (g; h) = � (g)� (h). The fact that � is indeed a representation is adirect consequence of the Egorov�s condition - Equation 3.1. We will refer to therepresentation � as the Heisenberg-Weil representation.

3.4.3. maximal tori in the Jacobi group. Given a non-split torus T � Sp, the conju-gate subgroup Tv = vTv�1 � J , for every v 2 V , will be called a maximal non-splittorus in J .It is easy to verify that the subgroups Tv; Tu are distinct for v 6= u; moreover,

for di¤erent tori T 6= T 0 � Sp the subgroups Tv; T 0u are distinct for every v; u 2 V .This implies that there are p (p� 1) p2 non-split maximal tori in J .

3.4.4. Bases associated with maximal tori. Restricting the Heisenberg-Weil repre-sentation � to a maximal non-split torus T � J yields a basis BT consisting ofcharacter vectors. A way to think of the basis BT is as follows: If T = Tv where Tis a maximal torus in Sp then the basis BTv can be derived from the already knownbasis BT by

BTv = � (v)BT ,

namely, the basis BTv consists of the vectors � (v)' where ' 2 BT .Interestingly, given any two tori T1; T2 � J , the bases BT1 ; BT2 remain � = 4 -

coherent - this is a direct consequence of the following generalization of Theorem3.6:

Theorem 3.7 ([GHS]). Given (not necessarily distinct) tori T1; T2 � Sp and apair of vectors ' 2 BT1 , � 2 BT2

jh'; � (v)�ij � 4pp,

for every v 2 V .

Since there exist p (p� 1) p2 distinct non-split tori in J , we obtain in this mannera collection of p (p� 1) p2 � p4 orthonormal bases

DEO =aT�J

non-split

BT .

Page 19: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

STATISTICAL RESTRICTED ISOMETRY PROPERTY 19

which are � = 4-coherent. We will call this dictionary the extended oscillatordictionary.

Remark 3.8. A way to interpret Theorem 3.7 is to say that any two di¤erentvectors ' 6= � 2 DO are incoherent in a stable sense, that is, their coherencyis 4=

pp no matter if any one of them undergoes an arbitrary time/phase shift.

This property seems to be important in communication where a transmitted signalmay acquire time shift due to asynchronous communication and phase shift due toDoppler e¤ect.

Appendix A. Proof of statements

A.1. Proof of Lemma 2.5. Let 2 Pk and � 2 �n. By de�nition, � ( ) = � � :[0; k]! [1; n]. Write

Ew�( ) = jnj�1XS2n

w�( ) (S) .

Direct veri�cation reveals that w�( ) (S) = w (� (S)) where � (S) = S � � :[1; n]! D, henceX

S2n

w�( ) (S) =XS2n

w (� (S)) =XS2n

w (S) ,

which implies that Ew�( ) = Ew .This concludes the proof of the lemma.

A.2. Proof of Lemma 2.7. We need to introduce the notion of a Dick word.

De�nition A.1. A Dick work of length 2m is a sequence D = d1d2:::d2m wheredi = �1, which satis�es

lXi=1

di � 0;

for every l = 1; ::; 2m.

Let us the denote by D2m the set of Dick words of length 2m. It is well knowthat jD2mj = �m.In addition, let us denote by T2m � P2m the subset of trees of length 2m. Our

goal is to establish a bijection

D : T2m=�n'! D2m:

Given a tree 2 T2m de�ne the word D ( ) = d1d2:::d2m as follows:

di =

�1 if (i� 1) is crossed for the �rst time on the i� 1 step�1 otherwise

:

The word D ( ) is a Dick word sincePl

i=1 di counts the number of verticesvisited exactly once by in the �rst l steps, therefore, it is greater or equal to zero.On the one direction, if two trees 1; 2 are isomorphic then D ( 1) = D ( 2).

In addition, it is easy to verify that the tree can be reconstructed from the pair(D ( ) ;

�!V ) where

�!V is the set of vertices of equipped with the following linear

order:v < u, crosses v for the �rst time before it crosses u for the �rst time.

Page 20: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

20 SHAMGAR GUREVICH AND RONNY HADANI

This implies that D de�nes an injection from T2m=�n into D2m.Conversely, it is easy to verify that for every Dick word D 2 D2m there is a tree

2 P2k such that D = D ( ), which implies that the map D is surjective.This concludes the proof of the lemma.

A.3. Proof of Lemma 2.9. We begin with an auxiliary construction. De�ne amap t : Ik ! P2k as follows: Given ( 1; 2) 2 Ik, let 0 � i1 � k be the �rst indexso that 1 (i1) 2 V 2 and let 0 � i2 � k be the �rst index such that 1 (i1) = 2 (i2).De�ne

1 t 2 (j) =

8<: 1 (j) 0 � j � i1 2 (i2 � i1 + j) i1 � j � i1 + k 1 (j � k) i1 + k � j � 2k

:

In words, the path 1 t 2 is obtained, roughly, by substituting the path 2instead of the vertex 1 (i1). Clearly, the map t is injective and commutes with theaction of �n, therefore, we get, in particular, that the number of elements in theisomorphism class [ 1; 2] 2 Ik=�n is smaller or equal than the number of elementsin the isomorphism class [ 1 t 2] 2 P2k=�n(A.1) j[ 1; 2]j � j[ 1 t 2]j .First estimate. We need to show

n�2 (p=n)k

X( 1; 2)2Ik

��E(w 1w 2)�� = O�n�1

�:

Write X( 1; 2)2Ik

��E(w 1w 2)�� = X[ 1; 2]2Ik=�n

j[ 1; 2]j ���E(w 1w 2)�� :

It is enough to show that for every [ 1; 2] 2 Ik=�n(A.2) n�2 (p=n)

k j[ 1; 2]j � jE(w 1w 2)j = O�n�1

�:

Fix an isomorphism class [ 1; 2] 2 Ik=�n. By Equation (A.1) we have thatj[ 1; 2]j � j[ 1 t 2]j. In addition, a simple observation reveals that w 1w 2 =w 1t 2 which implies that E

�w 1w 2

�= E

�w 1t 2

�. In conclusion, since the

length of 1 t 2 is 2k, we get that

n�2 (p=n)k j[ 1; 2]j � jE(w 1w 2)j � n�1n ([ 1 t 2])

��E �w 1t 2��� :Finally, by Theorem 2.6, we have that n ([ 1 t 2])

��E �w 1t 2��� = O (1), hence,Equation (A.2) follows.This concludes the proof of the �rst estimate.Second estimate. We need to show

n�2 (p=n)k

X( 1; 2)2Ik

��Ew 1�� ��Ew 2�� = O�n�1

�:

Write X( 1; 2)2Ik

��Ew 1�� ��Ew 2�� = X[ 1; 2]2Ik=�n

j[ 1; 2]j ���Ew 1�� ��Ew 2�� :

It is enough to show that for every [ 1; 2] 2 Ik=�n(A.3) n�2 (p=n)

k j[ 1; 2]j ���Ew 1�� ��Ew 2�� = O

�n�1

�.

Page 21: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

STATISTICAL RESTRICTED ISOMETRY PROPERTY 21

Fix an isomorphism class [ 1; 2] 2 Ik=�n. By Equation (A.1) we have thatj[ 1; 2]j � j[ 1 t 2]j. For every path , we have that j[ ]j = n(jV j) � njV j (sincealways jV j � k and we assume that k is �xed, that is, it does not depend on p), inparticular

j[ 1]j � njV 1 j;j[ 2]j � njV 2 j;

j[ 1 t 2]j � njV 1t 2 j.By construction,

��V 1t 2�� � ��V 1�� + ��V 2�� � 1 (we assume that V 1 \ V 2 6= ?),therefore, j[ 1 t 2]j = O

�n�1 j[ 1]j j[ 2]j

�. In conclusion, we get that

n�2 (p=n)k j[ 1; 2]j �

��Ew 1�� ��Ew 2�� = O�n�1n ( 1)n ( 2)

��Ew 1�� ��Ew 2��� ;where we used the identity n�2 (p=n)k j[ 1]j j[ 2]j = n ( 1)n ( 2). Finally, by The-orem 2.6, we have that n ( i)

��Ew i�� = O (1), i = 1; 2, hence, Equation (A.3)follows.This concludes the proof of the second estimate and concludes the proof of the

lemma.

A.4. Proof of Proposition 2.10. Write

Ew = j (V )j�1X

S2(V )

w (S)

= j (V )j�1X

S2(V nfvg)

Xb2DnS(V nfvg)

w (S t b) ;(A.4)

where S t b : V ! D is given by

S t b (u) =�S (u) u 6= vb u = v

:

Write Xb2DnS(V nfvg)

w (S t b) =Xb2D

w (S t b)�X

b2S(V nfvg)

w (S t b) :

Let us analyze separately the two terms in right side of the above equation.First term.Write X

b2Dw (S t b) =

Xx2X

Xbx2Bx

w (S t bx) :

Furthermore

w (S t bx) = h:; :i :: hS (vl) ; bxi hbx; S (vr)i :: h:; :i ,Since Bx is an orthonormal basisX

bx2Bx

hS (vl) ; bxi hbx; S (vr)i = hS (vl) ; S (vr)i ;

which implies thatP

bx2Bx

w (S t bx) = w bv (S). Concluding, we obtain

(A.5)Xb2D

w (S t b) = jXjw bv (S) .

Page 22: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

22 SHAMGAR GUREVICH AND RONNY HADANI

Second term.Let b 2 S (V n fvg). Since S is injective, there exists a unique u 2 V n fvg such

that b = S (u), therefore

w (S t b) = h:; :i :: hS (vl) ; bi hb; S (vr)i :: h:; :i = w u (S) .Furthermore, observe that when u = vl or u = vr we have that w u (S) =

w bv (S). In conclusion, we obtain(A.6)

Xb2S(V nfvg)

w (S t b) = 2w bv (S) +X

u2V nfvl;vr;vg

w u (S) .

Combining (A.5) and (A.6) yieldsXb2DnS(V nfvg)

w (S t b) = (jXj � 2)w bv (S)�X

u2V nfvl;vr;vg

w u (S) :

Substituting the above in (A.4) yields

Ew = (jXj � 2) j (V )j�1X

S2(V nfvg)

w bv (S)(A.7)

�X

u2V nfvl;vr;vg

j (V )j�1X

S2(V nfvg)

w u (S) :

Finally, direct counting argument reveals that

j (V )j � p jXj�� �V bv��� ;

j (V )j � p jXj�� �V u��� :

Hence (A.7) yields

Ew � p�1Ew bv �X

u2V nfvl;vr;vg

(p jXj)�1Ew u .

This concludes the proof of the proposition.

References

[A] Artin M., Algebra. Prentice Hall, Inc., Englewood Cli¤ s, NJ (1991).[B] Borel A. Linear algebraic groups. Graduate Texts in Mathematics, 126. Springer-Verlag,

New York (1991).[BDE] Bruckstein A.M., Donoho D.L. and Elad M., "From Sparse Solutions of Systems of Equa-

tions to Sparse Modeling of Signals and Images", to appear in SIAM Review (2007).[DE] Donoho D.L. and Elad M., Optimally sparse representation in general (non-orthogonal)

dictionaries via l_1 minimization. Proc. Natl. Acad. Sci. USA 100, no. 5, 2197�2202(2003).

[DGM] Daubechies I., Grossmann A. and Meyer Y., Painless non-orthogonal expansions. J. Math.Phys., 27 (5), pp. 1271-1283 (1986).

[EB] Elad M. and Bruckstein A.M., A Generalized Uncertainty Principle and Sparse Represen-tation in Pairs of Bases. IEEE Trans. On Information Theory, Vol. 48, pp. 2558-2567(2002).

[GG] Golomb, S.W. and Gong G. Signal design for good correlation. For wireless communication,cryptography, and radar. Cambridge University Press, Cambridge (2005).

[GH] Gurevich S. and Hadani R., Self-reducibility of the Weil representation and applications.arXiv:math/0612765 (2005).

[GHS] Gurevich S., Hadani R. and Sochen N., The �nite harmonic oscillator and its associated se-quences. Proceedings of the National Academy of Sciences of the United States of America(Accepted: Feb. 2008).

Page 23: Introduction - University of Texas at Austin(also called semi-circle) distribution. The result is then applied to various dic-tionaries that arise naturally in the setting of –nite

STATISTICAL RESTRICTED ISOMETRY PROPERTY 23

[GMS] Gilbert A.C., Muthukrishnan S. and Strauss M.J. Approximation of functions over re-dundant dictionaries using coherence. Proceedings of the Fourteenth Annual ACM-SIAMSymposium on Discrete Algorithms (Baltimore, MD, 2003), 243�252, ACM, New York,(2003).

[GN] Gribonval R. and Nielsen M. Sparse representations in unions of bases. IEEE Trans. In-form. Theory 49, no. 12, 3320�3325 (2003).

[H] Howe R., Nice error bases, mutually unbiased bases, induced representations, the Heisen-berg group and �nite geometries. Indag. Math. (N.S.) 16 , no. 3-4, 553�583 (2005)..

[HCM] Howard S. D., Calderbank A. R. and Moran W., The �nite Heisenberg-Weyl groups inradar and communications. EURASIP J. Appl. Signal Process. (2006).

[S] Sarwate D.V., Meeting the Welch bound with equality. Sequences and their applications(Singapore, 1998), 79�102, Springer Ser. Discrete Math. Theor. Comput. Sci., Springer,London, (1999).

[Se] Serre J.P., Linear representations of �nite groups. Graduate Texts in Mathematics, Vol.42. Springer-Verlag, New York-Heidelberg (1977).

[SH] Strohmer T. and Heath, R.W. Jr. Grassmannian frames with applications to coding andcommunication. Appl. Comput. Harmon. Anal. 14, no. 3 (2003).

[T] Terras A., Fourier analysis on �nite groups and applications. London Mathematical SocietyStudent Texts, 43. Cambridge University Press, Cambridge (1999).

[Tr] Tropp, J.A., Greed is good: algorithmic results for sparse approximation. IEEE Trans.Inform. Theory 50, no. 10 (2004) 2231�2242.

[W] Weil A., Sur certains groupes d�operateurs unitaires. Acta Math. 111, 143-211 (1964).

Department of Mathematics, University of California, Berkeley, CA 94720, USA.E-mail address : [email protected]

Department of Mathematics, University of Chicago, IL 60637, USA.E-mail address : [email protected]