Random matrix theory in sparse recovery

Maryia Kabanava

RWTH Aachen University

CoSIP Winter Retreat 2016

Maryia Kabanava (RWTH Aachen) Random matrix theory in sparse recovery CoSIP 2016

Compressed sensing

Goal: reconstruction of (high-dimensional) signals from minimalamount of measured data

Key ingredients:

Exploit low complexity of signals (e.g. sparsity/compressibility)

Efficient algorithms (e.g. convex optimization)

Randomness (random matrices)

Signal recovery problem

Signal x ∈ Rd is unknown.

Given:

Signal linear measurement map: M : Rd → Rm, m ≪ d .

Measurement vector: y = Mx + w ∈ Rm, ‖w‖2 ≤ η.

Goal: recover x from y .Idea: recovery is possible if x belongs to a set of low complexity.

Standard compressed sensing: sparsity (small number ofnonzero coefficients)

Cosparsity: sparsity after transformation

Structured sparsity: e.g. block sparsity

Low rank matrix recovery

Low rank tensor recovery

Noiseless model

m m × d=

under-determined linear system

supp x = S ⊂ {1, 2, . . . , d}ℓ0-minimization

minz∈Rd

‖z‖0 s.t. Mz = y

NP-hard

ℓ1-minimization

minz∈Rd

‖z‖1 s.t. Mz = y

efficient minim. methods

Nonuniform vs. uniform recovery

Nonuniform recoveryA fixed sparse (compressible) vector is recovered with highprobability using M.Sufficient conditions on M

Descent cone of ℓ1-norm at x intersects kerM trivially.Construct (approximate) dual certificate.

Uniform recoveryWith high probability on M every sparse (compressible)vector is recovered.Sufficient conditions on M

Null space property.Restricted isometry property.

Nonuniform recovery: descent cone

For fixed x ∈ Rd , we define the convex cone

T (x) = cone{z − x : z ∈ Rd , ‖z‖1 ≤ ‖x‖1}.

Theorem

Let M ∈ Rm×d . A vector x ∈ R

d isthe unique minimizer of ‖z‖1 subjectto Mz = Mx if and only ifkerM ∩ T (x) = {0}.

x + kerM

x + T (x)

Let Sd−1 = {x ∈ Rd : ‖x‖2 = 1} and set T := T (x) ∩ S

d−1. If

infx∈T

‖Mx‖2 > 0, (1)

then kerM ∩ T = ∅ and kerM ∩ T (x) = {0}.Maryia Kabanava (RWTH Aachen) Random matrix theory in sparse recovery CoSIP 2016

Uniform recovery: null space property (NSP)

M ∈ Rm×d is said to satisfy the stable NSP of order s with

0 < ρ < 1, if for any S ⊂ [d ] with |S | ≤ s it holds

‖vS‖1 < ρ‖vSc‖1 for all v ∈ kerM. (2)

Theorem

Let M ∈ Rm×d satisfy (2). Then, for any x ∈ R

d the solution x̂ of

minz∈Rd

‖z‖1 subject to Mz = y ,

with y = Mx, approximates x with ℓ1-error

‖x − x̂‖1 ≤2(1 + ρ)

1− ρσs(x)1, (3)

where σs(x)1 := inf {‖x − z‖1 : z is s-sparse}.

Strategy to check NSP

Tρ,s:={

w ∈ Rd : ‖wS‖1≥ ρ‖wSc‖1 for some S ⊂ [d ], |S |≤ s

Set T := Tρ,k ∩ Sd−1. If

infw∈T

‖Mw‖2 > 0,

then for any v ∈ kerM it holds

‖vS‖1 < ρ‖vSc‖1.

Uniform recovery: restricted isometry property (RIP)

Definition

The restricted isometry constant δs of a matrix M ∈ Rm×d is

defined as the smallest δs such that

(1− δs)‖x‖22 ≤ ‖Mx‖22 ≤ (1 + δs)‖x‖22 (4)

for all s-sparse x ∈ Rd .

Requires that all s-column submatrices of M arewell-conditioned.

δs = max|S|≤s

‖MTS MS − Id ‖2→2

Implies stable NSP.

We say that M satisfies the restricted isometry property if δs issmall for reasonably large s.

RIP implies recovery by ℓ1-minimization

(1− δs)‖x‖22 ≤ ‖Mx‖22 ≤ (1 + δs)‖x‖22 (5)

Theorem

Assume that the restricted isometry constant of M ∈ Rm×d

satisfiesδ2s < 1/

√2 ≈ 0.7071.

Then ℓ1-minimization reconstructs every s-sparse vector x ∈ Rd

from y = Mx.

Matrices satisfying recovery conditions

Open problem: Give explicit matrices M ∈ Rm×d that satisfy

recovery conditions.Goal: Successful recovery with M ∈ R

m×d , if

m ≥ Cs lnα(d),

for constants C and α.

Deterministic matrices known, for which m ≥ Cs2.

Way out: consider random matrices.

Gaussian random variables

A standard Gaussian random variabel X ∼ N(0, 1) has probabilitydensity function

ψ(x) =1√2π

e−x2/2. (6)

1 The tail of X decays super-exponentially

P(|X | > t) ≤ e−t2/2, t > 0. (7)

2 The absolute moments of X can be computed as

(E |X |p)1/p =√2

Γ((1 + p)/2)

Γ(1/2)

= O(√p), p ≥ 1.

3 The moment generating function of X equals

E exp(tX ) = et2/2, t ∈ R.

Subgaussian random variables

Let X be a random variable with EX = 0. Then the followingproperties are equivalent.

1 Tails: There exist β, κ > 0 such that

P(|X | > t) ≤ βe−κt2 for all t > 0. (8)

2 Moments:

(E |X |p)1/p ≤ C√p for all p ≥ 1. (9)

3 Moment generating function:

E exp(tX ) ≤ ect2

for all t ∈ R. (10)

A random variable X with EX = 0 that satisfies one of theproperties above is called subgaussian.

Subgaussian random variables: examples

1 Gaussian

2 Bernoulli: P {X = −1} = P {X = 1} =1

23 Bounded: |X | ≤ M almost surely for some M

Hoeffding-type inequality

Theorem

Let X1, . . . ,XN be a sequence of independent subgaussian randomvariables,

E exp(tXi) ≤ ect2

for all t ∈ R and i ∈ {1, . . . ,N}. (11)

For a ∈ RN , the random variable Z :=

aiXi is subgaussian, i.e.

E exp(tZ ) ≤ exp(

c‖a‖22t2)

for all t ∈ R (12)

≤ 2 exp

− t2

4c‖a‖22

Subexponential random variables

A random variable X with EX = 0 is called subexponential if thereexist β, κ > 0 such that

P(|X | > t) ≤ βe−κt for all t > 0. (14)

Theorem (Bernstein-type inequality)

Let X1, . . . ,XN be a sequence of independent subexponentialrandom variables,

P(|Xi | > t) ≤ βe−κt for all t > 0 and i ∈ {1, . . . ,N}. (15)

≤ 2 exp

− (κt)2

2βN + κt

Random matrices

Definition

Let M ∈ Rm×d be a random matrix.

If the entries of M are independent Bernoulli variables (i.e.taking values ±1 with equal probability), then M is called aBernoulli random matrix.

If the entries of M are independent standard Gaussian randomvariables, then M is called a Gaussian random matrix.

If the entries of M are independent subgaussian randomvariables,

P (|Mjk | ≥ t) ≤ βe−κt2 for all t > 0,

then M is called a subgaussian random matrix.

RIP for subgaussian random matrices

Theorem

Let M ∈ Rm×d be subgaussian random matrix. Then there exists

C = C (β, κ) > 0 such that the restricted isometry constant of1√mM satisfies δs ≤ δ w.p. at least 1− ε provided

m ≥ Cδ−2(

s ln(ed/s) + ln(2ε−1))

. (17)

Random matrices with subgaussian rows

Let Y ∈ Rd be random.

If E |〈Y , x〉|2 = ‖x‖22 for all x ∈ Rd , then Y is called isotropic.

If, for all x ∈ Rd with ‖x2‖ = 1, the random variable 〈Y , x〉 is

subgaussian,

E exp (t〈Y , x〉) ≤ exp(ct2) for all t ∈ R, (c is indep. of x),

then Y is called a subgaussian random vector.

Theorem

Let M ∈ Rm×d be random with independent, isotropic,

subgaussian rows with the same parameter c. If

m ≥ Cδ−2(

s ln(ed/s) + ln(2ε−1))

, (18)

then the restricted isometry constant of 1√mM satisfies δs ≤ δ w.p.

at least 1− ε.

Ingredients of the proof: concentration inequality

Let M ∈ Rm×d be random with independent, isotropic,

subgaussian rows. Then, for all x ∈ Rd and every t ∈ (0, 1),

∣m−1‖Mx‖22 − ‖x‖22∣

∣ ≥ t‖x‖22)

≤ 2 exp(−ct2m). (19)

Proof.

Let x ∈ Rd , ‖x‖2 = 1. Denote the rows of M by Y1, . . . ,Ym ∈ R

d .Define

Zi = |〈Yi , x〉|2 − ‖x‖22, i = 1, . . . ,m.

EZi = 0, P (|Zi | ≥ r) ≤ β exp(−κr)

m−1‖Mx‖22 − ‖x‖22 = m−1m∑

Bernstein inequality:

m−1m∑

≤ 2 exp

− κ2

4β + 2κmt2

Ingredients of the proof: covering argument

Let M ∈ Rm×d be random and

∣m−1‖Mx‖22 − ‖x‖22∣

∣ ≥ t‖x‖22)

≤ 2 exp(−ct2m) for all x ∈ Rd .

Define M̃ = 1√mM. Then

∣‖M̃x‖22 − ‖x‖22

∣≥ t‖x‖22

For S ⊂ {1, . . . , d}, |S | = s and δ, ε ∈ (0, 1), if

m ≥ Cδ−2(7s + 2 ln(2ε−1)), (20)

then w.p. at least 1− ε

‖M̃TS M̃S − Id ‖2→2 < δ. (21)

Ingredients of the proof: union bound

Let M̃ ∈ Rm×d be random and

∣‖M̃x‖22 − ‖x‖22

∣≥ t‖x‖22

If for δ, ε ∈ (0, 1),

m ≥ Cδ−2[

s(9 + 2 ln(d/s)) + 2 ln(2ε−1)]

, (22)

then w.p. at least 1− ε, the restricted isometry constant δs of M̃satisfies δs < δ.

Gaussian width

For T ⊂ Rd we define its Gaussian width by

ℓ(T ) := Esupx∈T

〈x , g〉, g ∈ Rd is Gaussian. (23)

width Due to the rotation invariance(23) can be written as

ℓ(T ) = E‖g‖2 · Esupx∈T

〈x , u〉,

where u is uniformly distributedon S

d−1.

ℓ(Sd−1) = E sup‖x‖2=1

〈x , g〉 = E‖g‖2 ∼√d

D := conv{

x ∈ Sd−1 : |supp x | ≤ s

, ℓ(D) ∼√

s ln(d/s)

Gordon’s escape through a mesh

ℓ(T ) := Esupx∈T

〈x , g〉, g ∈ Rd is Gaussian.

Em := E‖g‖2 =√2 Γ((m+1)/2)

Γ(m/2) , g ∈ Rm is Gaussian,

m√m + 1

≤ Em ≤ √m.

Theorem

Let M ∈ Rm×d be Gaussian and T ⊂ S

d−1. Then, for t > 0, itholds

infx∈T

‖Mx‖2 > Em − ℓ(T )− t

≥ 1− e−t2

2 . (24)

The proof relies on the concentration of measure inequality forLipschitz functions.m is determined by:

Em ≥ m√m + 1

≥ ℓ(T ) + t +1

τ(m & ℓ(T )2)

Estimates for Gaussian widths of T (x)

T (x) = cone{z − x : z ∈ Rd , ‖z‖1 ≤ ‖x‖1} (25)

N (x) := {z ∈ Rd:〈z ,w−x〉 ≤ 0 for all w s.t. ‖w‖1 ≤ ‖x‖1} (26)

ℓ(T (x) ∩ Sd−1) ≤ E min

z∈N (x)‖g − z‖2, g ∈ R

d is a standard

Gaussian random vector.

Let supp(x) = S . Then

N (x) =⋃

z ∈ Rd : zi = t sgn(xi ), i ∈ S , |zi | ≤ t, i ∈ Sc

T (x) ∩ Sd−1)]2 ≤ 2s ln(ed/s)

Nonuniform recovery with Gaussian measurements

Theorem

Let x ∈ Rd be an s-sparse vector. Let M ∈ R

m×d be a randomlydrawn Gaussian matrix. If, for some ε ∈ (0, 1),

m + 1≥ 2s

ln(ed/s) +

ln(ε−1)

, (27)

then w.p. at least 1− ε the vector x is the unique minimizer of‖z‖1 subject to Mz = Mx.

Estimates for Gaussian widths of Tρ,s

Tρ,s:={

w ∈ Rd :‖wS‖1≥ρ‖wSc‖1 for some S ⊂ [d ], |S |= s

D := conv{

x ∈ Sd−1 : |supp(x)| ≤ s

Tρ,s ∩ Sd−1 ⊂ (1 + ρ−1)D

ℓ(D) ≤√

2s ln(ed/s) +√s

ℓ(Tρ,s ∩ Sd−1) ≤ (1 + ρ−1)(

2s ln(ed/s) +√s)

Ununiform recovery with Gaussian measurements

Theorem

Let M ∈ Rm×d be Gaussian, 0 < ρ < 1 and 0 < ε < 1. If

m + 1≥ 2s

(1 + ρ−1)2)

ln(ed/s) +1√2+

ln(ε−1)

s ((1 + ρ−1)2)

then w. p. at least 1− ε for every x ∈ Rd a minimizer x̂ of ‖z‖1

subject to Mz = Mx approximates x with ℓ1-error

‖x − x̂‖1 ≤2(1 + ρ)

(1− ρ)σs(x)1.

Thank you for your attention !!!

Random matrix theory in sparse recovery - TU Berlin

Documents

Olga Sorkine, TU Berlin, 2007 1 CG 11 As-Rigid-As-Possible Surface Modeling Olga Sorkine Marc Alexa TU Berlin

Modeling crawling cell movement - TU Berlin

AXONOMETRIE - TU Berlin

Stability radii for real linear Hamiltonian ... - TU Berlin

TU9 German institutes of technology SEVERAL LOCATIONS TU Berlin, TU Braunschweig, TU Darmstadt, TU Dresden, Leibniz Universität Hannover, Karlsruhe Institute

1.Introduction. - TU Berlin

Chair of Aeroengine TU Berlin 2013

Fundamentals of Space Technology - TU Berlin

Maths for Intelligent Systems - TU Berlin

DiscreteStructuresI–Combinatorics - TU Berlin

Chem Soc Rev - TU Berlin

Deutsche Forschungsgemeinschaft - HRK · Deutsche Forschungsgemeinschaft ... averbach/pic/footsteps.gif; ... Berlin HU Berlin FU Berlin TU Dresden TU

TU Berlin VGU GPEM HCMC (2)

TU Berlin EUREF Master Student's Guide · TU Berlin EUREF Master Student’s Guide 2017-2 Edition Welcome to Berlin! This guide provides you with information regarding the steps necessary

Experiences from TU Berlin and FraunhoferFOKUS on NGN

Modeling crawling cell motility - TU Berlin

Individual Car Trac - TU Berlin

M.Sc. Energy Engineering - TU Berlin€¦ · Berlin, Germanys Capital city, is at “the heart of Europe” and thus TU Berlin has a broad international outreach. Berlin houses headquarters

Sparse Proteomics Analysis (SPA) - TU Berlin · Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for Feature Selection from Forward Models Martin Genzel Technische Universit

Quantum Mechanics I (P303 / M326) - TU Berlin