From Seismology to Compressed Sensing and Back, a Brief ...cfgranda/pages/stuff...From Seismology to...

Preview:

Citation preview

From Seismology to Compressed Sensing and Back,

a Brief History of Optimization-Based Signal Processing

Carlos Fernandez-Grandawww.cims.nyu.edu/~cfgranda

Signal and Information Processing Seminar, Rutgers University

4/27/2016

Deconvolution in seismology

Compressed sensing

Back to deconvolution: the super-resolution problem

Super-resolution via semidefinite programming

Demixing of sines and spikes

Deconvolution in seismology

Compressed sensing

Back to deconvolution: the super-resolution problem

Super-resolution via semidefinite programming

Demixing of sines and spikes

Seismology

Reflection seismology

Geological section Acoustic impedance Reflection coefficients

Reflection seismology

Sensing Ref. coeff. Pulse Data

Data ≈ convolution of pulse and reflection coefficients

Sensing model for reflection seismology

Ref. coeff. Pulse Data

∗ =

Spectrum × =

Convolution in time = Pointwise multiplication in frequency

Ill-posed problem! How do we choose between signals consistent with data?

Geophysicists: Minimize `1 norm

Minimum `1-norm estimate

minimize ||estimate||1subject to estimate ∗ pulse = data

Reflection coefficients Estimate

It works, but under what conditions?

Minimum `1-norm estimate

minimize ||estimate||1subject to estimate ∗ pulse = data

Reflection coefficients Estimate

It works, but under what conditions?

Deconvolution in seismology

Compressed sensing

Back to deconvolution: the super-resolution problem

Super-resolution via semidefinite programming

Demixing of sines and spikes

Magnetic resonance imaging

Images are sparse/compressible

Wavelet coefficients

Magnetic resonance imaging

Data: Samples from spectrum

Problem: Sampling is time consuming (annoying, patient might move)

Images are compressible (≈ sparse)

Can we recover compressible signals from less data?

Compressed sensing

1. Undersample the spectrum randomly

1D 2D

Data

Compressed sensing

2. Solve the optimization problem

minimize ||estimate||1subject to frequency samples of estimate = data

Signal Estimate

Compressed sensing

2. Solve the optimization problem

minimize ||estimate||1subject to frequency samples of estimate = data

Signal Estimate

Compressed sensing in MRI

x2 Undersampling

Theoretical questions

1. Is the problem well posed?

2. Does `1-norm minimization work?

Is the problem well posed?

=

=

Spectrumof x

Is the problem well posed?

=

=

Spectrumof x

Measurement operator = random frequency samples

Is the problem well posed?

=

=

Spectrumof x

Is the problem well posed?

=

=

Spectrumof x

What is the effect of the measurement operator on sparse vectors?

Is the problem well posed?

=

=

Spectrumof x

Are sparse submatrices always well conditioned?

Is the problem well posed?

=

=

Spectrumof x

Are sparse submatrices always well conditioned?

Restricted isometry property (RIP)

An m × n matrix A satisfy the restricted isometry property if there is0 < δ < 1 such that for any s-sparse vector x

(1− δ) ||x ||2 ≤ ||Ax ||2 ≤ (1 + δ) ||x ||2

Random Fourier matrices satisfy the RIP with high probabilityif m ≥ O (s) up to log factors (Candès, Tao 2006)

2s-RIP implies that for any s-sparse signals x1, x2

||y2 − y1||2 ≥ (1− δ) ||x2 − x1||2

Theoretical questions

1. Is the problem well posed?

2. Does `1-norm minimization work?

Characterizing the minimum `1-norm estimate

I Aim: Show that the original signal x is the solution of

minimize∣∣∣∣x ′∣∣∣∣1

subject to A x ′ = y

I This is guaranteed by the existence of a dual certificate

Dual certificate

v ∈ Rm is a dual certificate associated to x if

q := AT v

satisfies

qi = sign (xi ) if xi 6= 0|qi | < 1 if xi = 0

Dual certificate

q is a subgradient of the `1 norm at x

For any x + h such that Ah = 0

||x + h||1 ≥ ||x ||1 + qTh

= ||x ||1 + vTAh= ||x ||1

If AT (where T is the support of x) is injective, x is the unique solution

Dual certificate for compressed sensing

Aim: Show that a dual certificate exists for any sparse supportand sign pattern

Certificate for compressed sensing

Idea: Minimum-energy interpolator has closed-form solution

Certificate for compressed sensing

Valid certificate if m ≥ O (s) up to log factors

(Candès, Romberg, Tao 2006)

Deconvolution in seismology

Compressed sensing

Back to deconvolution: the super-resolution problem

Super-resolution via semidefinite programming

Demixing of sines and spikes

Limits of resolution in imaging

The resolving power of lenses, however perfect, is limited (Lord Rayleigh)

Diffraction imposes a fundamental limit on the resolution of optical systems

Fluorescence microscopy

Data

Point sources Low-pass blur

(Figures courtesy of V. Morgenshtern)

Super-resolution

I Optics: Data-acquisition techniques to overcome the diffraction limit

I Image processing: Methods to upsample images onto a finer gridwhile preserving edges and hallucinating textures

I This talk: Estimation/deconvolution from low-pass measurements

Sensing model for super-resolution

Point sourcesPoint-spreadfunction

Data

∗ =

Spectrum × =

Deconvolution problem as in reflection seismology

Minimum `1-norm estimate

minimize ||estimate||1subject to estimate ∗ psf = data

Point sources Estimate

Minimum `1-norm estimate

minimize ||estimate||1subject to estimate ∗ psf = data

Point sources Estimate

Mathematical model

I Signal: superposition of Dirac measures with support T

x =∑

j

ajδtj aj ∈ C, tj ∈ T ⊂ [0, 1]

I Data: low-pass Fourier coefficients with cut-off frequency fc

y = Fc x

y(k) =

∫ 1

0e−i2πktx (dt) =

∑j

aje−i2πktj , k ∈ Z, |k | ≤ fc

Compressed sensing vs super-resolution

Compressed sensing Super-resolution

spectrum interpolation spectrum extrapolation

Total-variation norm

I Continuous counterpart of the `1 norm

I If x =∑

j ajδtj then ||x ||TV =∑

j |aj |I Not the total variation of a piecewise-constant function

I Formal definition: For a complex measure ν

||ν||TV = sup∞∑j=1

|ν (Bj)| ,

(supremum over all finite partitions Bj of [0, 1])

Theoretical questions

1. Is the problem well posed?

2. Does TV -norm minimization work?

Is the problem well posed?

=

=

Spectrumof x

Is the problem well posed?

=

=

Spectrumof x

Measurement operator = low-pass samples with cut-off frequency fc

Is the problem well posed?

=

=

Spectrumof x

Measurement operator = low-pass samples with cut-off frequency fc

Is the problem well posed?

=

=

Spectrumof x

Effect of measurement operator on sparse vectors?

Is the problem well posed?

=

=

Spectrumof x

Submatrix can be very ill conditioned!

Is the problem well posed?

=

=

Spectrumof x

If support is spread out there is hope

Minimum separation

The minimum separation ∆ of the support of x is

∆ = inf(t,t′) ∈ support(x) : t 6=t′

|t − t ′|

Conditioning of submatrix with respect to ∆

I If ∆ < 1/fc the problem is ill posedI If ∆ > 1/fc the problem becomes well posedI Proved asymptotically by Slepian and non-asymptotically by Moitra

1/fc is the diameter of the main lobe of the point-spread function(twice the Rayleigh distance)

Lower bound on ∆

I Above what minimum distance ∆ is the problem well posed?

I Numerical lower bound on ∆:

1. Compute singular values of restricted operator for different values of∆diff

2. Find ∆diff under which the restricted operator is ill conditioned

3. Then ∆ ≥ 2∆diff

Fc

Singular values of the restricted operator

Number of spikes = s, fc = 103

0.1 0.2 0.3 0.4 0.5 0.610−17

10−13

10−9

10−5

10−1

∆ fc

σj

s = 40

j = 1j = 0.25 sj = 0.5 sj = 0.75 sj = s

0.1 0.2 0.3 0.4 0.5 0.610−17

10−13

10−9

10−5

10−1

∆ fc

σj

s = 100

Phase transition at ∆diff = 1/2fc → ∆ = 1/fc

Singular values of the restricted operator

Number of spikes = s, fc = 103

0.1 0.2 0.3 0.4 0.5 0.610−17

10−13

10−9

10−5

10−1

∆ fc

σj

s = 200

0.1 0.2 0.3 0.4 0.5 0.610−17

10−13

10−9

10−5

10−1

∆ fc

σj

s = 500

Phase transition at ∆diff = 1/2fc → ∆ = 1/fc

Example: 25 spikes, fc = 103, ∆ = 0.8/fc

Signals Data (in signal space)

0

0.1

0.2

0.3

0

100

200

300

0

0.1

0.2

0.3

0

100

200

300

Example: 25 spikes, fc = 103, ∆ = 0.8/fc

0

0.1

0.2

0.3Signal 1Signal 2

0

100

200

300

Signal 1Signal 2

Signals Data (in signal space)

Example: 25 spikes, fc = 103, ∆ = 0.8/fc

The difference is almost in the null space of the measurement operator

−0.2

0

0.2

−2,000 −1,000 0 1,000 2,000

10−8

10−6

10−4

10−2

100

Difference Spectrum

Theoretical questions

1. Is the problem well posed?

2. Does TV -norm minimization work?

Estimation via convex programming

For data of the form y = Fc x , we solve

minx||x ||TV subject to Fc x = y ,

over all finite complex measures x supported on [0, 1]

Dual certificate of TV norm

A dual certificate of the TV norm at

x =∑

j

ajδtj aj ∈ C, tj ∈ T

guarantees that x is the unique solution if

q := F∗c v =∑

k≤|fc |vke i2πkt

q (tj) = sign (aj) if tj ∈ T

|q (t)| < 1 if t /∈ T

Range of F∗c is spanned by low pass sinusoids instead of random sinusoids

Certificate for super-resolution

1

0

−1

Aim: Interpolate sign pattern

Certificate for super-resolution

1

0

−1

1st idea: Interpolation with a low-frequency fast-decaying kernel K

q(t) =∑tj∈T

αj K (t − tj)

Certificate for super-resolution

1

0

−1

1st idea: Interpolation with a low-frequency fast-decaying kernel K

q(t) =∑tj∈T

αj K (t − tj)

Certificate for super-resolution

1

0

−1

1st idea: Interpolation with a low-frequency fast-decaying kernel K

q(t) =∑tj∈T

αj K (t − tj)

Certificate for super-resolution

1

0

−1

1st idea: Interpolation with a low-frequency fast-decaying kernel K

q(t) =∑tj∈T

αj K (t − tj)

Certificate for super-resolution

1

0

−1

1st idea: Interpolation with a low-frequency fast-decaying kernel K

q(t) =∑tj∈T

αj K (t − tj)

Certificate for super-resolution

1

0

−1

1

0

−1

Problem: Magnitude of certificate locally exceeds 1

Solution: Add correction term and force the derivative of the certificate toequal zero on the support

q(t) =∑tj∈T

αj K (t − tj) + βj K ′ (t − tj)

Certificate for super-resolution

1

0

−1

1

0

−1

Problem: Magnitude of certificate locally exceeds 1

Solution: Add correction term and force the derivative of the certificate toequal zero on the support

q(t) =∑tj∈T

αj K (t − tj) + βj K ′ (t − tj)

Certificate for super-resolution

1

0

−1

1

0

−1

Problem: Magnitude of certificate locally exceeds 1

Solution: Add correction term and force the derivative of the certificate toequal zero on the support

q(t) =∑tj∈T

αj K (t − tj) + βj K ′ (t − tj)

Certificate for super-resolution

1

0

−1

1

0

−1

Similar construction for bandpass point-spreadfunctions relevant to reflection seismology

Sketch of proof: Interpolation kernel

Key step: Designing a good interpolation kernel

· · =

0.273 fc 0.36 fc 0.367 fc fc

∗ ∗ =

Trade-off between spikiness at the origin and asymptotic decay

Sketch of proof: Non-asymptotic bounds on kernel

Kernel

Kernel

1st derivative

1st derivative

2nd derivative

2nd derivative3rd derivative

3rd derivative

Kernel (fc = 103) Kernel (fc = 5 103) Kernel (fc = 104)Upper bound Lower bound

Figure 1: Upper and lower bounds on K� and its derivatives.

1

Guarantees for super-resolution

Theorem [Candès, F. 2012]

If the minimum separation of the signal support obeys

∆ ≥ 2 /fc

then recovery via convex programming is exact

Theorem [Candès, F. 2012]

In 2D convex programming super-resolves point sources with aminimum separation of

∆ ≥ 2.38 /fc

where fc is the cut-off frequency of the low-pass kernel

Guarantees for super-resolution

Theorem [F. 2016]

If the minimum separation of the signal support obeys

∆ ≥ 1.26 /fc ,

then recovery via convex programming is exact

Theorem [Candès, F. 2012]

In 2D convex programming super-resolves point sources with aminimum separation of

∆ ≥ 2.38 /fc

where fc is the cut-off frequency of the low-pass kernel

Numerical evaluation of minimum separation

fc = 30 fc = 40 fc = 50

0.2 0.4 0.6 0.8 1

5

10

15

20

25

Minimum separation ∆min fc

Num

ber

ofsp

ikes

0.2 0.4 0.6 0.8 1

10

20

30

Minimum separation ∆min fc

Num

ber

ofsp

ikes

0.2 0.4 0.6 0.8 1

10

20

30

Minimum separation ∆min fc

Num

ber

ofsp

ikes

0

0.2

0.4

0.6

0.8

1

Conjecture: TV-norm minimization succeeds if ∆ ≥ 1fc

Dual certificate as theoretical tool

Subsequent work builds on our construction to analyzeI Stability of super-resolution [Candès, F. 2013], [F. 2013], [Azais, De

Castro, Gamboa 2013], [Duval, Peyré 2013]I Denoising of line spectra [Tang, Bhaskar, Recht 2013]I Compressed sensing off the grid [Tang, Bhaskar, Shah, Recht 2013]I Recovery of splines from their projection onto spaces of algebraic

polynomials [Bendory, Dekel, Feuer 2013], [De Castro, Mijoule 2014]I Recovery of point sources from spherical harmonics [Bendory, Dekel,

Feuer 2013]

Deconvolution in seismology

Compressed sensing

Back to deconvolution: the super-resolution problem

Super-resolution via semidefinite programming

Demixing of sines and spikes

Practical implementation

I Primal problem:

minx||x ||TV subject to Fc x = y

Infinite-dimensional variable x (measure in [0, 1])

First option: Discretizing + `1-norm minimization

I Dual problem:

maxu∈Cn

Re [y∗u] subject to ||F∗c u||∞ ≤ 1, n := 2fc + 1

Finite-dimensional variable u, but infinite-dimensional constraint

F∗c u =∑

k≤|fc |uke i2πkt

Second option: Solving the dual problem

Practical implementation

I Primal problem:

minx||x ||TV subject to Fc x = y

Infinite-dimensional variable x (measure in [0, 1])

First option: Discretizing + `1-norm minimization

I Dual problem:

maxu∈Cn

Re [y∗u] subject to ||F∗c u||∞ ≤ 1, n := 2fc + 1

Finite-dimensional variable u, but infinite-dimensional constraint

F∗c u =∑

k≤|fc |uke i2πkt

Second option: Solving the dual problem

Lemma: Semidefinite representation

The Fejér-Riesz Theorem and the semidefinite representation of polynomialsums of squares imply that

||F∗c u||∞ ≤ 1

is equivalent to

There exists a Hermitian matrix Q ∈ Cn×n such that[Q uu∗ 1

]� 0,

n−j∑i=1

Qi ,i+j =

{1, j = 0,0, j = 1, 2, . . . , n − 1.

Consequence: The dual problem is a tractable semidefinite program

Support-locating polynomial

How do we obtain an estimator from the dual solution?

Dual solution vector: Fourier coefficients of low-pass polynomial thatinterpolates the sign of the primal solution (follows from strong duality)

Idea: Use the polynomial to locate the support of the signal

Super-resolution via semidefinite programming

Super-resolution via semidefinite programming

Super-resolution via semidefinite programming

1. Solve semidefinite program to obtain dual solution

Super-resolution via semidefinite programming

2. Locate points at which corresponding polynomial has unit magnitude

Super-resolution via semidefinite programming

Signal Estimate

3. Estimate amplitudes via least squares

Support-location accuracy

fc 25 50 75 100

Average error 6.66 10−9 1.70 10−9 5.58 10−10 2.96 10−10

Maximum error 1.83 10−7 8.14 10−8 2.55 10−8 2.31 10−8

For each fc , 100 random signals with |T | = fc/4 and ∆(T ) ≥ 2/fc

Deconvolution in seismology

Compressed sensing

Back to deconvolution: the super-resolution problem

Super-resolution via semidefinite programming

Demixing of sines and spikes

Spectral super-resolution

I Signal: Multisinusoidal signal

g (t) :=∑fj∈T

cje−i2πfj t

g =∑fj∈T

cjδfj

I Data: n samples measured at Nyquist rate

g (k) :=∑fj∈T

cje−i2πkfj , 1 ≤ k ≤ n

Equivalent to our super-resolution model!

Spectral Super-resolution

Spectrum

Signal

Data

Demixing of sines and spikes

Aim: Super-resolving the spectrum of a multi-sinusoidal signal (sines)in the presence of impulsive events (spikes)

y = Fc x + s

Demixing of sines and spikes

Sines

+ =

Spectrum

+ =

x

+ s = y

Demixing of sines and spikes

Sines

+ =

Spectrum

+ =

Fc x

+ s = y

Demixing of sines and spikes

Sines

+ =

Spectrum

+ =

Fc x

+ s = y

Demixing of sines and spikes

Sines Spikes

+

=

Spectrum +

=

Fc x + s

= y

Demixing of sines and spikes

Sines Spikes Data

+ =

Spectrum + =

Fc x + s = y

Demixing of sines and spikes

Estimator: Solution to

minx , s||x ||TV + γ ||s||1 subject to Fc x + s = y

Dual problem:

maxu∈Cn

Re [y∗u] subject to ||F∗c u||∞ ≤ 1, ||u||∞ ≤ γ

Dual solution: uI u interpolates the sign of the primal solution sI F∗c u interpolates the sign of the primal solution x

Demixing of sines and spikes

Estimator: Solution to

minx , s||x ||TV + γ ||s||1 subject to Fc x + s = y

Dual problem:

maxu∈Cn

Re [y∗u] subject to ||F∗c u||∞ ≤ 1, ||u||∞ ≤ γ

Dual solution: uI u interpolates the sign of the primal solution sI F∗c u interpolates the sign of the primal solution x

Demixing of sines and spikesu F∗c u

Dualsolution

Estimate

s s x x

Spikes Sines (spectrum)

Demixing of sines and spikesu F∗c u

Dualsolution

Estimate

s s x x

Spikes Sines (spectrum)

Conclusion

I Geophysicists pioneered the use of `1-norm regularization forunderdetermined inverse problems

I Mathematicians and statisticians developed theoretical tools tounderstand compressed sensing

I Adapting these insights allows to analyze the potential and limitationsof convex programming for super-resolution

Deconvolution with the `1 norm (Taylor, Banks, McCoy ’79)

Data

Fit

Pulse

Estimate

References: Reflection seismology

I Robust modeling with erratic data. J. F. Claerbout and F. Muir.Geophysics, 1973

I Deconvolution with the `1 norm. H. L. Taylor, S. C. Banks and J. F. McCoy.Geophysics, 1979

I Reconstruction of a sparse spike train from a portion of its spectrum andapplication to high-resolution deconvolution. S. Levy and P. K. Fullagar.Geophysics, 1981

I Linear inversion of band-limited reflection seismograms. F. Santosa and W.W. Symes. SIAM J. Sci. Stat. Comp., 1986

References: Compressed sensing

I Stable signal recovery from incomplete and inaccurate measurements. E. J.Candès, J. Romberg and T. Tao. Comm. Pure Appl. Math., 2005

I Decoding by linear programming. E. J. Candès and T. Tao. IEEE Trans.Inform. Theory, 2004

I Sparse MRI: The application of compressed sensing for rapid MR imaging.M. Lustig, D. Donoho and J. M. Pauly. Magn Reson Med., 2007

References: Super-resolution

I Prolate spheroidal wave functions, Fourier analysis, and uncertainty VV - The discrete case. D. Slepian. Bell System Technical Journal, 1978

I Super-resolution, extremal functions and the condition number ofVandermonde matrices. A. Moitra. Symposium on Theory of Computing(STOC), 2015

I Towards a mathematical theory of super-resolution. E. J. Candès andC. Fernandez-Granda. Comm. on Pure and Applied Math., 2013

I Super-resolution of point sources via convex programming.C. Fernandez-Granda. Information and Inference, 2016

Recommended