Upload
huevonomar05
View
213
Download
0
Embed Size (px)
DESCRIPTION
Rct matrix - SL2BR
Citation preview
arX
iv:1
509.
0074
8v1
[m
ath.
FA]
2 S
ep 2
015
AN ELEMENTARY APPROACH TO THE PROBLEM OF COLUMN
SELECTION IN A RECTANGULAR MATRIX
STÉPHANE CHRÉTIEN AND SÉBASTIEN DARSES
Abstract. The problem of extracting a well conditioned submatrix from any rectangularmatrix (with normalized columns) has been studied for some time in functional and har-monic analysis; see [1, 4, 6] for methods using random column selection. More constructiveapproaches have been proposed recently; see the recent contributions of [3, 7]. The columnselection problem we consider in this paper is concerned with extracting a well conditionedsubmatrix, i.e. a matrix whose singular values all lie in [1 − ε, 1 + ε]. We provide individ-ual lower and upper bounds for each singular value of the extracted matrix at the price ofconceding only one log factor in the number of columns, when compared to the RestrictedInvertibility Theorem of Bourgain and Tzafriri. Our method is fully constructive and theproof is short and elementary.
1. Introduction
Let X ∈ Rn×p be a matrix such that all columns of X have unit euclidean ℓ2-norm. We
denote by ‖x‖2 the ℓ2-norm of a vector x and by ‖X‖ (resp. ‖X‖HS) the associated operatornorm (resp. the Hilbert-Schmidt norm). Let XT denote the submatrix of X obtained byextracting the columns of X indexed by T ⊂ {1, . . . , p}. For any real symmetric matrix A, letλk(A) denote the k-th eigenvalue of A, and we order the eigenvalues as λ1(A) ≥ λ2(A) ≥ · · · .We also write λmin(A) (resp. λmax(A)) for the smallest (resp. largest) eigenvalue of A. Wefinally write |S| for the size of a set S.
The problem of well conditioned column selection that we condider here consists in findingthe largest subset of columns of X such that the corresponding submatrix has all singularvalues in a prescribed interval [1 − ε, 1 + ε]. The one-sided problem of finding the largestpossible T such that λmin(X
tTXT ) ≥ 1 − ε is called the Restricted Invertibility Problem and
has a long history starting with the seminal work of Bourgain and Tzafriri [1]. Applications ofsuch results are well known in the domain of harmonic analysis [1]. The study of the conditionnumber is also a subject of extensive study in statistics and signal processing [5].
Here, we propose an elementary approach to this problem based on two simple ingredients:
(1) Choosing recursively y ∈ V, the set of remaining columns of X, verifying
Q(y) ≤ 1
|V|∑
x∈VQ(x),
where Q is a relevant quantity depending on the previous chosen vectors;(2) a well-known equation (sometimes called secular equation) whose roots are the eigen-
values of a square matrix after appending a row and a line.
We obtain a slightly weaker bound (up to log) regarding the involved largest subset of columns,but also a more precise result: equispaced upper and lower bounds for all ordered individualsingular values of the extracted matrix XT .
1
2 STÉPHANE CHRÉTIEN AND SÉBASTIEN DARSES
1.1. Historical background. Concerning the Restricted Invertibility problem, Bourgain andTzafriri [1] obtained the following result for square matrices:
Theorem 1.1 ([1]). Given a p × p matrix X whose columns have unit ℓ2-norm, there exists
T ⊂ {1, . . . , p} with |T | ≥ dp
‖X‖2 such that C ≤ λmin(XtTXT ), where d and C are absolute
constants.
See also [4] for a simpler proof. Vershynin [6] generalized Bourgain and Tzafriri’s result tothe case of rectangular matrices and the estimate of |T | was improved as follows.
Theorem 1.2 ([6]). Given a n× p matrix X and letting X̃ be the matrix obtained from X by
ℓ2-normalizing its columns. Then, for any ε ∈ (0, 1), there exists T ⊂ {1, . . . , p} with
|T | ≥ (1− ε)‖X‖2HS
‖X‖2
such that C1(ε) ≤ λmin(X̃tT X̃T ) ≤ λmax(X̃
tT X̃T ) ≤ C2(ε).
Recently, Spielman and Srivastava proposed in [3] a deterministic construction of T whichallows them to obtain the following result.
Theorem 1.3 ([3]). Let X be a p× p matrix and ε ∈ (0, 1). Then there exists T ⊂ {1, . . . , p}with |T | ≥ (1− ε)2
‖X‖2HS
‖X‖2 such that ε2‖X‖2p
≤ λmin(XtTXT ).
The technique of proof relies on new constructions and inequalities which are thoroughlyexplained in the Bourbaki seminar of Naor [2]. Using these techniques, Youssef [7] improvedVershynin’s result as:
Theorem 1.4 ([7]). Given a n × p matrix X and letting X̃ be the matrix obtained from Xby ℓ2-normalizing its columns. Then, for any ε ∈ (0, 1), there exists T ⊂ {1, . . . , p} with
|T | ≥ ε2
9
‖X‖2HS
‖X‖2 such that 1− ε ≤ λmin(X̃tT X̃T ) ≤ λmax(X̃
tT X̃T ) ≤ 1 + ε.
1.2. Our contribution. We propose a short and elementary proof of the following result:
Theorem 1.5. Given a n×p matrix X whose columns have unit ℓ2-norm, a constant ε ∈ (0, 1)there exists T ⊂ {1, . . . , p} with |T | ≥ R and
R logR ≤ ε2
4(1 + ε)
p
‖X‖2 ,(1.1)
such that 1− ε ≤ λmin(XtTXT ) ≤ λmax(X
tTXT ) ≤ 1 + ε.
Notice that when the columns of X have unit ℓ2-norm, we have ‖X‖HS = Tr(XXt) = p.The price to pay for this short proof is a log factor in (1.1), but we are able to obtain anindividual control of each eigenvalue, see Lemma 2.2, which might be interesting in its ownright.
2. Proof of Theorem 1.5
2.1. Suitable choice of the extracted vectors. Consider the set of vectors V0 = {x1, . . . , xp}.At step 1, choose y1 ∈ V0. By induction, let us be given y1, . . . , yr at step r. Let Yr denote
AN ELEMENTARY APPROACH TO THE PROBLEM OF COLUMN SELECTION 3
the matrix whose columns are y1, . . . , yr and let vk be an unit eigenvector of Y tr Yr associated
to λk,r := λk(Ytr Yr). Let us choose yr+1 ∈ Vr := {x1, . . . , xp} \ {y1, . . . , yr} so that
r∑
k=1
(vtkYtr yr+1)
2
k≤ 1
p− r
∑
x∈Vr
r∑
k=1
(vtkYtr x)
2
k=
1
p− r
r∑
k=1
∑x∈Vr
(vtkYtr x)
2
k.(2.2)
Lemma 2.1. For all r ≥ 1, yr+1 verifies
r∑
k=1
(vtkYtr yr+1)
2
k≤ λ1,r‖X‖2 log(r)
p− r.
Proof. Let Xr be the matrix whose columns are the x ∈ Vr, i.e. XrXtr =
∑x∈Vr
xxt. Then∑
x∈Vr
(vtkYtr x)
2 = Tr(Yrvkv
tkY
trXrX
tr
)≤ Tr(Yrvkv
tkY
tr )‖XrX
tr‖ ≤ λk,r‖X‖2,
which yields the conclusion by plugging in into (2.2) since λk,r ≤ λ1,r. �
2.2. Controlling the individual eigenvalues. Let us define δ as
δ =
√(1 + ε)‖X‖2 logR
p,
so that, from (1.1), 2δ√R ≤ ε.
Lemma 2.2. For all r and k with 1 ≤ k ≤ r ≤ R, we have
1− δr + k − 1√
r≤ λk,r ≤ 1 + δ
2r − k√r
.(2.3)
Proof. It is clear that (2.3) holds for r = 1 since then, 1 is the only singular value because thecolumns are supposed to be normalized.
Assume the induction hypothesis (Hr): for all k with 1 ≤ k ≤ r < R, (2.3) holds.Let us then show that (Hr+1) holds. By Cauchy interlacing theorem, we have
λk+1,r+1 ≤ λk,r, 1 ≤ k ≤ r
λk+1,r+1 ≥ λk+1,r, 0 ≤ k ≤ r − 1.
Using (r + 1)(2r − k)2 ≤ r(2r + 1− k)2 and (r + 1)(r + k)2 ≤ r(r + 1 + k)2, we thus deduce
λk+1,r+1 ≤ 1 + δ2r − k√
r≤ 1 + δ
2(r + 1)− (k + 1)√r + 1
, 1 ≤ k ≤ r,
λk+1,r+1 ≥ 1− δr + k√
r≥ 1− δ
(r + 1) + (k + 1)− 1√r + 1
, 0 ≤ k ≤ r − 1.
It remains to obtain the upper estimate for λ1,r+1 and the lower one for λr+1,r+1. We write
Y tr+1Yr+1 =
[ytr+1
Y tr
] [yr+1 Yr
]=
[1 ytr+1Yr
Y tr yr+1 Y t
r Yr
],(2.4)
and it is well known that the eigenvalues of Y tr+1Yr+1 are the zeros of the secular equation:
q(λ) := 1− λ+r∑
k=1
(vtkYtr yr+1)
2
λ− λk,r
= 0.(2.5)
4 STÉPHANE CHRÉTIEN AND SÉBASTIEN DARSES
We first estimate λ1,r+1 which is the greatest zero of q, and assume for contradiction that
λ1,r+1 > 1 + 2δ√r.(2.6)
From (Hr), we then obtain that for λ ≥ 1 + 2δ√r ≥ λ1,r + δ/
√r,
q(λ) ≤ 1− λ+
√r
δ
r∑
k=1
(vtkYtr yr+1)
2
k:= g(λ).
Let λ0 be the zero of g. We have g(λ1,r+1) ≥ q(λ1,r+1) = 0 = g(λ0). But g is decreasing, so
λ1,r+1 ≤ λ0 = 1 +
√r
δ
r∑
k=1
(vtkYtr yr+1)
2
k.
By (Hr), λ1,r ≤ 1 + 2δ√R ≤ 1 + ε. Thus, using Lemma 2.1 and noting that r ≤ p/2,
λ1,r+1 ≤ 1 +2√r
δ
(1 + ε)‖X‖2 log(R)
p= 1 + 2δ
√r,
which yields a contradiction with the inequality (2.6). Thus, we have that λ1,r+1 ≤ 1+2δ√r,
and therefore, λ1,r+1 ≤ 1 + δ 2r+1√r+1
. This shows that the upper bound in (Hr+1) holds.
Finally, to estimate λr+1,r+1 which is the smallest zero of q, we write using (Hr) that forλ ≤ 1− 2δ
√r ≤ λr,r − δ/
√r,
q(λ) ≥ 1− λ−√r
δ
r∑
k=1
(vtkYtr yr+1)
2
k:= g̃(λ).
By means of the same reasonning as above, we prove by contradiction that λr+1,r+1 ≥ 1−2δ√r,
which gives λr+1,r+1 ≥ 1 − δ 2r+1√r+1
and shows that the lower bound in (Hr+1) holds. This
completes the proof of Lemma 2.2. �
In particular, we have for all r ≤ R, λ1,r ≤ 1+ 2δ√R ≤ 1+ ε and λr,r ≥ 1− 2δ
√R ≥ 1− ε.
This concludes the proof of Theorem 1.5.
Remark 2.3. Many other induction hypothesis may be proposed: λk,r ≤ u(k, r), where u is
required to verify u(k, r) ≤ u(k+1, r+1). The criteria to choose the next vector yr+1 has then
to be modified accordingly. For instance, it can also be proven that one can extract a submatrix
so that λk,r ≤ 1 + δ√r − k. This yields as well the weaker bound with the log.
References
1. Bourgain, J. and Tzafriri, L., Invertibility of "large” submatrices with applications to the geometry ofBanach spaces and harmonic analysis. Israel J. Math. 57 (1987), no. 2, 137–224.
2. Naor, A., Sparse quadratic forms and their geometric applications [following Batson, Spielman and Sri-vastava]. Séminaire Bourbaki: Vol. 2010/2011. Exposés 1027–1042. Astérisque No. 348 (2012), Exp. No.1033, viii, 189–217.
3. Spielman, D. A. and Srivastava, N., An elementary proof of the restricted invertibility theorem. Israel J.Math. 190 (2012), 83–91.
4. Tropp, J., The random paving property for uniformly bounded matrices, Studia Math., vol. 185, no. 1, pp.67–82, 2008.
5. Tropp, J., Norms of random submatrices and sparse approximation. C. R. Acad. Sci. Paris, Ser. I (2008),Vol. 346, pp. 1271-1274.
6. Vershynin, R., John’s decompositions: selecting a large part. Israel J. Math. 122 (2001), 253–277.7. Youssef, P. A note on column subset selection. Int. Math. Res. Not. IMRN 2014, no. 23, 6431–6447.
AN ELEMENTARY APPROACH TO THE PROBLEM OF COLUMN SELECTION 5
National Physical Laboratory, Hampton road, Teddington TW11 0LW, UK
E-mail address: [email protected]
LATP, UMR 6632, Université Aix-Marseille, Technopôle Château-Gombert, 39 rue Joliot
Curie, 13453 Marseille Cedex 13, France
E-mail address: [email protected]