5
arXiv:1509.00748v1 [math.FA] 2 Sep 2015 AN ELEMENTARY APPROACH TO THE PROBLEM OF COLUMN SELECTION IN A RECTANGULAR MATRIX STÉPHANE CHRÉTIEN AND SÉBASTIEN DARSES Abstract. The problem of extracting a well conditioned submatrix from any rectangular matrix (with normalized columns) has been studied for some time in functional and har- monic analysis; see [1, 4, 6] for methods using random column selection. More constructive approaches have been proposed recently; see the recent contributions of [3, 7]. The column selection problem we consider in this paper is concerned with extracting a well conditioned submatrix, i.e. a matrix whose singular values all lie in [1 - ε, 1+ ε]. We provide individ- ual lower and upper bounds for each singular value of the extracted matrix at the price of conceding only one log factor in the number of columns, when compared to the Restricted Invertibility Theorem of Bourgain and Tzafriri. Our method is fully constructive and the proof is short and elementary. 1. Introduction Let X R n×p be a matrix such that all columns of X have unit euclidean 2 -norm. We denote by x2 the 2 -norm of a vector x and by X(resp. XHS ) the associated operator norm (resp. the Hilbert-Schmidt norm). Let X T denote the submatrix of X obtained by extracting the columns of X indexed by T ⊂{1,...,p}. For any real symmetric matrix A, let λ k (A) denote the k-th eigenvalue of A, and we order the eigenvalues as λ 1 (A) λ 2 (A) ≥··· . We also write λ min (A) (resp. λ max (A)) for the smallest (resp. largest) eigenvalue of A. We finally write |S | for the size of a set S . The problem of well conditioned column selection that we condider here consists in finding the largest subset of columns of X such that the corresponding submatrix has all singular values in a prescribed interval [1 ε, 1+ ε]. The one-sided problem of finding the largest possible T such that λ min (X t T X T ) 1 ε is called the Restricted Invertibility Problem and has a long history starting with the seminal work of Bourgain and Tzafriri [1]. Applications of such results are well known in the domain of harmonic analysis [1]. The study of the condition number is also a subject of extensive study in statistics and signal processing [5]. Here, we propose an elementary approach to this problem based on two simple ingredients: (1) Choosing recursively y ∈V , the set of remaining columns of X, verifying Q(y) 1 |V| x∈V Q(x), where Q is a relevant quantity depending on the previous chosen vectors; (2) a well-known equation (sometimes called secular equation ) whose roots are the eigen- values of a square matrix after appending a row and a line. We obtain a slightly weaker bound (up to log) regarding the involved largest subset of columns, but also a more precise result: equispaced upper and lower bounds for all ordered individual singular values of the extracted matrix X T . 1

1509.00748

Embed Size (px)

DESCRIPTION

Rct matrix - SL2BR

Citation preview

Page 1: 1509.00748

arX

iv:1

509.

0074

8v1

[m

ath.

FA]

2 S

ep 2

015

AN ELEMENTARY APPROACH TO THE PROBLEM OF COLUMN

SELECTION IN A RECTANGULAR MATRIX

STÉPHANE CHRÉTIEN AND SÉBASTIEN DARSES

Abstract. The problem of extracting a well conditioned submatrix from any rectangularmatrix (with normalized columns) has been studied for some time in functional and har-monic analysis; see [1, 4, 6] for methods using random column selection. More constructiveapproaches have been proposed recently; see the recent contributions of [3, 7]. The columnselection problem we consider in this paper is concerned with extracting a well conditionedsubmatrix, i.e. a matrix whose singular values all lie in [1 − ε, 1 + ε]. We provide individ-ual lower and upper bounds for each singular value of the extracted matrix at the price ofconceding only one log factor in the number of columns, when compared to the RestrictedInvertibility Theorem of Bourgain and Tzafriri. Our method is fully constructive and theproof is short and elementary.

1. Introduction

Let X ∈ Rn×p be a matrix such that all columns of X have unit euclidean ℓ2-norm. We

denote by ‖x‖2 the ℓ2-norm of a vector x and by ‖X‖ (resp. ‖X‖HS) the associated operatornorm (resp. the Hilbert-Schmidt norm). Let XT denote the submatrix of X obtained byextracting the columns of X indexed by T ⊂ {1, . . . , p}. For any real symmetric matrix A, letλk(A) denote the k-th eigenvalue of A, and we order the eigenvalues as λ1(A) ≥ λ2(A) ≥ · · · .We also write λmin(A) (resp. λmax(A)) for the smallest (resp. largest) eigenvalue of A. Wefinally write |S| for the size of a set S.

The problem of well conditioned column selection that we condider here consists in findingthe largest subset of columns of X such that the corresponding submatrix has all singularvalues in a prescribed interval [1 − ε, 1 + ε]. The one-sided problem of finding the largestpossible T such that λmin(X

tTXT ) ≥ 1 − ε is called the Restricted Invertibility Problem and

has a long history starting with the seminal work of Bourgain and Tzafriri [1]. Applications ofsuch results are well known in the domain of harmonic analysis [1]. The study of the conditionnumber is also a subject of extensive study in statistics and signal processing [5].

Here, we propose an elementary approach to this problem based on two simple ingredients:

(1) Choosing recursively y ∈ V, the set of remaining columns of X, verifying

Q(y) ≤ 1

|V|∑

x∈VQ(x),

where Q is a relevant quantity depending on the previous chosen vectors;(2) a well-known equation (sometimes called secular equation) whose roots are the eigen-

values of a square matrix after appending a row and a line.

We obtain a slightly weaker bound (up to log) regarding the involved largest subset of columns,but also a more precise result: equispaced upper and lower bounds for all ordered individualsingular values of the extracted matrix XT .

1

Page 2: 1509.00748

2 STÉPHANE CHRÉTIEN AND SÉBASTIEN DARSES

1.1. Historical background. Concerning the Restricted Invertibility problem, Bourgain andTzafriri [1] obtained the following result for square matrices:

Theorem 1.1 ([1]). Given a p × p matrix X whose columns have unit ℓ2-norm, there exists

T ⊂ {1, . . . , p} with |T | ≥ dp

‖X‖2 such that C ≤ λmin(XtTXT ), where d and C are absolute

constants.

See also [4] for a simpler proof. Vershynin [6] generalized Bourgain and Tzafriri’s result tothe case of rectangular matrices and the estimate of |T | was improved as follows.

Theorem 1.2 ([6]). Given a n× p matrix X and letting X̃ be the matrix obtained from X by

ℓ2-normalizing its columns. Then, for any ε ∈ (0, 1), there exists T ⊂ {1, . . . , p} with

|T | ≥ (1− ε)‖X‖2HS

‖X‖2

such that C1(ε) ≤ λmin(X̃tT X̃T ) ≤ λmax(X̃

tT X̃T ) ≤ C2(ε).

Recently, Spielman and Srivastava proposed in [3] a deterministic construction of T whichallows them to obtain the following result.

Theorem 1.3 ([3]). Let X be a p× p matrix and ε ∈ (0, 1). Then there exists T ⊂ {1, . . . , p}with |T | ≥ (1− ε)2

‖X‖2HS

‖X‖2 such that ε2‖X‖2p

≤ λmin(XtTXT ).

The technique of proof relies on new constructions and inequalities which are thoroughlyexplained in the Bourbaki seminar of Naor [2]. Using these techniques, Youssef [7] improvedVershynin’s result as:

Theorem 1.4 ([7]). Given a n × p matrix X and letting X̃ be the matrix obtained from Xby ℓ2-normalizing its columns. Then, for any ε ∈ (0, 1), there exists T ⊂ {1, . . . , p} with

|T | ≥ ε2

9

‖X‖2HS

‖X‖2 such that 1− ε ≤ λmin(X̃tT X̃T ) ≤ λmax(X̃

tT X̃T ) ≤ 1 + ε.

1.2. Our contribution. We propose a short and elementary proof of the following result:

Theorem 1.5. Given a n×p matrix X whose columns have unit ℓ2-norm, a constant ε ∈ (0, 1)there exists T ⊂ {1, . . . , p} with |T | ≥ R and

R logR ≤ ε2

4(1 + ε)

p

‖X‖2 ,(1.1)

such that 1− ε ≤ λmin(XtTXT ) ≤ λmax(X

tTXT ) ≤ 1 + ε.

Notice that when the columns of X have unit ℓ2-norm, we have ‖X‖HS = Tr(XXt) = p.The price to pay for this short proof is a log factor in (1.1), but we are able to obtain anindividual control of each eigenvalue, see Lemma 2.2, which might be interesting in its ownright.

2. Proof of Theorem 1.5

2.1. Suitable choice of the extracted vectors. Consider the set of vectors V0 = {x1, . . . , xp}.At step 1, choose y1 ∈ V0. By induction, let us be given y1, . . . , yr at step r. Let Yr denote

Page 3: 1509.00748

AN ELEMENTARY APPROACH TO THE PROBLEM OF COLUMN SELECTION 3

the matrix whose columns are y1, . . . , yr and let vk be an unit eigenvector of Y tr Yr associated

to λk,r := λk(Ytr Yr). Let us choose yr+1 ∈ Vr := {x1, . . . , xp} \ {y1, . . . , yr} so that

r∑

k=1

(vtkYtr yr+1)

2

k≤ 1

p− r

x∈Vr

r∑

k=1

(vtkYtr x)

2

k=

1

p− r

r∑

k=1

∑x∈Vr

(vtkYtr x)

2

k.(2.2)

Lemma 2.1. For all r ≥ 1, yr+1 verifies

r∑

k=1

(vtkYtr yr+1)

2

k≤ λ1,r‖X‖2 log(r)

p− r.

Proof. Let Xr be the matrix whose columns are the x ∈ Vr, i.e. XrXtr =

∑x∈Vr

xxt. Then∑

x∈Vr

(vtkYtr x)

2 = Tr(Yrvkv

tkY

trXrX

tr

)≤ Tr(Yrvkv

tkY

tr )‖XrX

tr‖ ≤ λk,r‖X‖2,

which yields the conclusion by plugging in into (2.2) since λk,r ≤ λ1,r. �

2.2. Controlling the individual eigenvalues. Let us define δ as

δ =

√(1 + ε)‖X‖2 logR

p,

so that, from (1.1), 2δ√R ≤ ε.

Lemma 2.2. For all r and k with 1 ≤ k ≤ r ≤ R, we have

1− δr + k − 1√

r≤ λk,r ≤ 1 + δ

2r − k√r

.(2.3)

Proof. It is clear that (2.3) holds for r = 1 since then, 1 is the only singular value because thecolumns are supposed to be normalized.

Assume the induction hypothesis (Hr): for all k with 1 ≤ k ≤ r < R, (2.3) holds.Let us then show that (Hr+1) holds. By Cauchy interlacing theorem, we have

λk+1,r+1 ≤ λk,r, 1 ≤ k ≤ r

λk+1,r+1 ≥ λk+1,r, 0 ≤ k ≤ r − 1.

Using (r + 1)(2r − k)2 ≤ r(2r + 1− k)2 and (r + 1)(r + k)2 ≤ r(r + 1 + k)2, we thus deduce

λk+1,r+1 ≤ 1 + δ2r − k√

r≤ 1 + δ

2(r + 1)− (k + 1)√r + 1

, 1 ≤ k ≤ r,

λk+1,r+1 ≥ 1− δr + k√

r≥ 1− δ

(r + 1) + (k + 1)− 1√r + 1

, 0 ≤ k ≤ r − 1.

It remains to obtain the upper estimate for λ1,r+1 and the lower one for λr+1,r+1. We write

Y tr+1Yr+1 =

[ytr+1

Y tr

] [yr+1 Yr

]=

[1 ytr+1Yr

Y tr yr+1 Y t

r Yr

],(2.4)

and it is well known that the eigenvalues of Y tr+1Yr+1 are the zeros of the secular equation:

q(λ) := 1− λ+r∑

k=1

(vtkYtr yr+1)

2

λ− λk,r

= 0.(2.5)

Page 4: 1509.00748

4 STÉPHANE CHRÉTIEN AND SÉBASTIEN DARSES

We first estimate λ1,r+1 which is the greatest zero of q, and assume for contradiction that

λ1,r+1 > 1 + 2δ√r.(2.6)

From (Hr), we then obtain that for λ ≥ 1 + 2δ√r ≥ λ1,r + δ/

√r,

q(λ) ≤ 1− λ+

√r

δ

r∑

k=1

(vtkYtr yr+1)

2

k:= g(λ).

Let λ0 be the zero of g. We have g(λ1,r+1) ≥ q(λ1,r+1) = 0 = g(λ0). But g is decreasing, so

λ1,r+1 ≤ λ0 = 1 +

√r

δ

r∑

k=1

(vtkYtr yr+1)

2

k.

By (Hr), λ1,r ≤ 1 + 2δ√R ≤ 1 + ε. Thus, using Lemma 2.1 and noting that r ≤ p/2,

λ1,r+1 ≤ 1 +2√r

δ

(1 + ε)‖X‖2 log(R)

p= 1 + 2δ

√r,

which yields a contradiction with the inequality (2.6). Thus, we have that λ1,r+1 ≤ 1+2δ√r,

and therefore, λ1,r+1 ≤ 1 + δ 2r+1√r+1

. This shows that the upper bound in (Hr+1) holds.

Finally, to estimate λr+1,r+1 which is the smallest zero of q, we write using (Hr) that forλ ≤ 1− 2δ

√r ≤ λr,r − δ/

√r,

q(λ) ≥ 1− λ−√r

δ

r∑

k=1

(vtkYtr yr+1)

2

k:= g̃(λ).

By means of the same reasonning as above, we prove by contradiction that λr+1,r+1 ≥ 1−2δ√r,

which gives λr+1,r+1 ≥ 1 − δ 2r+1√r+1

and shows that the lower bound in (Hr+1) holds. This

completes the proof of Lemma 2.2. �

In particular, we have for all r ≤ R, λ1,r ≤ 1+ 2δ√R ≤ 1+ ε and λr,r ≥ 1− 2δ

√R ≥ 1− ε.

This concludes the proof of Theorem 1.5.

Remark 2.3. Many other induction hypothesis may be proposed: λk,r ≤ u(k, r), where u is

required to verify u(k, r) ≤ u(k+1, r+1). The criteria to choose the next vector yr+1 has then

to be modified accordingly. For instance, it can also be proven that one can extract a submatrix

so that λk,r ≤ 1 + δ√r − k. This yields as well the weaker bound with the log.

References

1. Bourgain, J. and Tzafriri, L., Invertibility of "large” submatrices with applications to the geometry ofBanach spaces and harmonic analysis. Israel J. Math. 57 (1987), no. 2, 137–224.

2. Naor, A., Sparse quadratic forms and their geometric applications [following Batson, Spielman and Sri-vastava]. Séminaire Bourbaki: Vol. 2010/2011. Exposés 1027–1042. Astérisque No. 348 (2012), Exp. No.1033, viii, 189–217.

3. Spielman, D. A. and Srivastava, N., An elementary proof of the restricted invertibility theorem. Israel J.Math. 190 (2012), 83–91.

4. Tropp, J., The random paving property for uniformly bounded matrices, Studia Math., vol. 185, no. 1, pp.67–82, 2008.

5. Tropp, J., Norms of random submatrices and sparse approximation. C. R. Acad. Sci. Paris, Ser. I (2008),Vol. 346, pp. 1271-1274.

6. Vershynin, R., John’s decompositions: selecting a large part. Israel J. Math. 122 (2001), 253–277.7. Youssef, P. A note on column subset selection. Int. Math. Res. Not. IMRN 2014, no. 23, 6431–6447.

Page 5: 1509.00748

AN ELEMENTARY APPROACH TO THE PROBLEM OF COLUMN SELECTION 5

National Physical Laboratory, Hampton road, Teddington TW11 0LW, UK

E-mail address: [email protected]

LATP, UMR 6632, Université Aix-Marseille, Technopôle Château-Gombert, 39 rue Joliot

Curie, 13453 Marseille Cedex 13, France

E-mail address: [email protected]