Approximate Message Passinggpapan/pos12/talks/Montanari.pdf · Approximate Message Passing Mohsen...

Approximate Message Passing

Mohsen Bayati, David Donoho, Adel JavanmardIain Johnstone, Marc Lelarge, Arian Maleki, Andrea Montanari

Stanford University

December 8, 2012

Andrea Montanari (Stanford) AMP December 8, 2012 1 / 68

Statistical estimation

y = f (�;noise)

� ! Unknown objecty ! Observations

f ( � ;noise) ! Parametric model

Problem: Estimate � from observations y .

Statistical estimation

y = f (�;noise)

� ! Unknown objecty ! Observations

f ( � ;noise) ! Parametric model

Problem: Estimate � from observations y .

A broad convergence

I Statistics

[Genomics, . . . ]

I Data mining

[Collaborative �ltering, Predictive analytics, . . . ]

I Signal processing

[Compressive sampling, . . . ]

I Inverse problems

[Medical imaging, Seismographic imaging,. . . ]

+Data, + Computation, Exploit hidden structure

How should we think about these problems?

Optimization?

maximize Likelihood(�jy)� Complexity(�)

`Separation principle'

I Modeler/statistician proposes convex cost function.

I Optimization expert proposes simple iterative algorithm.

I Run for 20 iterations and hope for the best.

Beyond separation?

y ! �̂1 ! �̂2 ! �̂3 ! : : :

I Constrained complexity per iteration

I Fixed number of iterations (say 20)

I What is minimum MSE achievable?

Outline

I A long example (algorithm + heuristics)

I A list of theorems/pointers

A long example

What type of example?

I Image processing (because they make nice �gures)

I Compressed sensing (simple)

WILL NOT MENTION SPARSITY!

An image

� = 2 Cn

Unknown object (n = 5122 � 2:5 � 105)

Noiseless linear measurements

y = A� = A�

Want to reconstruct �

Noiseless linear measurements

y = A� = A�

Want to reconstruct �

Measurement structure

A = eFReF = subsampled Fourier matrix

26666664+1

�1�1

37777775 = random modulation

! y 2 Cm , m = 0:17n

An approach popular in this community

y = A� + z ; z � N(0; �2Im�m)

p�jY (�jy) / expn� 1

2�2ky �A�k22

op�(�)

/mYa=1

expn� 1

�ya �ATa �

�2o nYi=1

p�i(�i )

y = A� + z ; z � N(0; �2Im�m)

2�2ky �A�k22

op�(�)

/mYa=1

expn� 1

�ya �ATa �

�2o nYi=1

p�i(�i )

y = A� + z ; z � N(0; �2Im�m)

2�2ky �A�k22

op�(�)

/mYa=1

expn� 1

�ya �ATa �

�2o nYi=1

p�i(�i )

Factor graph!

p�jY (�jy) /mYa=1

expn� 1

�ya �ATa �

�2o nYi=1

p�i(�i )

Use BP!

Many issues

I Anyone knows the prior distribution of natural images?

I Computation per iteration, memory �(mn).

I Very loopy graph.

I . . .

Let us try something simpler!

Many issues

I Anyone knows the prior distribution of natural images?

I Computation per iteration, memory �(mn).

I Very loopy graph.

I . . .

Let us try something simpler!

Constructing a �rst estimate

y = A�

Matched �lter

�̂1 =1

How good is this?

E �̂1 = (one line calculation) = �

Check it out

�̂1 = Ayy = � =

Does not look that good!

Check it out

�̂1 = Ayy = � =

Does not look that good!

= + `noise'

How big is the `noise'?

Efk�̂1 � �k22g = (two lines calculation) =1� �

�k�k22

Matched �lter blows up noise

MSEout =1� �

�MSEin

Let's check

0 0.5 1 1.5 2 2.5 3 3.5 4

MSEout

Denoising

�̂1 � � + � z ; zi � N(0; 1)

Idea: Treat �̂1 as e�ective observations in denoising

Denoising

�̂1 � � + � z ; zi � N(0; 1)

Idea: Treat �̂1 as e�ective observations in denoising

Denoising by nonlocal means

by = � + � z ;

�̂i =

Pj W (i ; j )byjPj W (i ; j )

W (i ; j ) =

(1 if kPatch(i ; by)� Patch(j ; by)k22 � � �2;

0 otherwise

[Buades, Coll, Morel, 2005]

�̂ � �(by)Andrea Montanari (Stanford) AMP December 8, 2012 25 / 68

by = � + � z ;

�̂i =

W (i ; j ) =

0 otherwise

by = � + � z ;

�̂i =

W (i ; j ) =

0 otherwise

Patches

Will it work?

�̂2 = �(�̂1) = �(Ayy) = ��

Let's try

�̂1 = Ayy = �̂2 = �(Ayy) =

Better than garbage!

How much better?

0 1 2 3 4 5 6 7 8 9

MSEout

How much better?

0 1 2 3 4 5 6 7 8 9

MSEout

Let us repeat the denoising experiment

Let us repeat the denoising experiment: y = � + � z

�(y)

� = 1 � = 0:5 � = 0:25 � = 0:12

Quantitatively

0 1 2 3 4 5 6 7 8 9

MSEout

Quantitatively

0.0001

0.001 0.01 0.1 1 10

MSEout

Approximate denoiser characterization

MSEout = cpMSEin

(enough for our purposes)

(see also Maleki, Baraniuk, Narayan, 2012)

(Arias-Castro, Willett, 2012)

What we achieved so far

0 0.5 1 1.5 2 2.5 3 3.5 4 0

0 1 2 3 4 5 6 7 8 9

MSEout

DenoiserMatched �lter

0 1 2 3 4 5 6 7 8 9 0

0 1 2 3 4 5 6 7 8 9

MSEout

DenoiserMatched �lter

0 1 2 3 4 5 6 7 8 9

MSEold

MSEnew

0.0001

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

What about iterating?

0.0001

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

What about iterating?

How do we iterate?

Will tell you later!

�̂1 =

�̂2 =

�̂3 =

0.0001

0.001 0.01 0.1 1 10 100

MSEold

MSEnew Non obvious!

�̂4 =

0.0001

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

�̂5 =

0.0001

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

�̂6 =

0.0001

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

�̂7 =

0.0001

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

�̂8 =

0.0001

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

�̂9 =

0.0001

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

t = 0; 1; 2; 3; : : : ; 20

0.0001

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

How do we iterate?

Approximate Message Passing (AMP)

�̂2t = �(�̂2t�1)

�̂2t+1 = �̂2t +Ay r t

r t = y �A�̂2t + bt rt�1

mdiv�(�̂2t�1)

(can be computed explicitly)

[Thouless, Anderson, Palmer, 1977, Kabashima, 2003,

Donoho, Maleki, Montanari, 2009, Donoho, Johnstone, Montanari, 2012]

Connection with Belief Propagation

mi!j = mi + "i!j

Linearize in "i!j

Very di�erent from naive mean �eld!

Connection with Belief Propagation

mi!j = mi + "i!j

Linearize in "i!j

Very di�erent from naive mean �eld!

Connection with Perturbation, Optimization,Statistics?

I Robustness wrt pX 2DistributionClass

(� = minimax denoider in DistributionClass)

I Can rigorously track evolution over A random(What about A = Adet + �Arand?)

Connection with Perturbation, Optimization,Statistics?

I Robustness wrt pX 2DistributionClass

(� = minimax denoider in DistributionClass)

I Can rigorously track evolution over A random(What about A = Adet + �Arand?)

A list of theorems/pointers

I Connection with optimization [Bayati, Montanari, 2011]

I Minimax theory for sparse/block-sparse/TV[Donoho, Johnstone, Montanari, 2012]

I Bayesian reconstruction up to information dimension[Donoho, Javanmard, Montanari, 2012]

I Universality [Bayati, Lelarge, Montanari, 2012]

I Analysis of Generalized AMP [Javanmard, Montanari, 2012]

I Application to sparse PCA [In preparation, 2013]

Related work

I (Non-rigorous) replica method.[Tanaka 2002, Guo, Verdú 2005, Kabashima, Tanaka 2009, Rangan,

Fletcher, Goyal 2009, Caire, Tulino, Shamai, Verdú 2012. . . ]

I Alternative argument for robust regression (e.g. min� ky �A�k1)[Bean, Bickel, El Karoui, Lim, Yu 2012]

I Generalized linear models [Rangan 2011]

I Graphical model priors [Schniter et al. 2010-. . . ]

I Low-rank matrices [Rangan, Fletcher 2012]

Conclusion

I Can do message passing/BP without Bayesian assumptions!

I There is something between naive mean �eld and BP

Thanks!

Noise distribution?

−5 0 5

Approximate Message Passinggpapan/pos12/talks/Montanari.pdf · Approximate Message Passing Mohsen...

Documents

Approximate string comparators

Approximate Computing with Approximate Circuits ...jhan8/publications/IntroAC_ArithmeticCircuits... · comparative evaluation of approximate arithmetic ... –Symbolic error analysis

NEWTOWN ˛ CONNECTICUT - LoopNet · 2017-08-08 · approximate wetland area intermittent watercourse wetland area approximate wetland area approximate wetland area approximate lot

Approximate Computing with Approximate Circuits: …jhan8/publications/esweek17... · 2017-10-11 · Approximate Computing with Approximate Circuits: Methodologies and Applications

Approximate Data Exchange

Diffusion and Cascading Behavior in Random Networks Marc Lelarge (INRIA-ENS) WIDS MIT, May 31, 2011

Approximate Counting

Approximate Symmetry Analysis and Approximate ...downloads.hindawi.com/journals/amp/2018/4743567.pdfyields the approximate conservation laws for a perturbed PDEs.erearemainlytwomethods.Oneapproach,based

Approximate Randomization tests

black Measuring Interaction Proxemics with Wearable Light Tagscm542/papers/ubicomp2018-montanari.pdf · 25 Measuring Interaction Proxemics with Wearable Light Tags ALESSANDRO MONTANARI∗,

Analyzing the Innovative Activity of Young Businesses: Perspectives and Difficulties using Administrative Data Claire Lelarge (STI/EAS) 21st Meeting of

Approximate Dynamic Programmingchercheurs.lille.inria.fr/~lazaric/Webpage/EC-RL_Course...Approximate Dynamic Programming (a.k.a. Batch Reinforcement Learning) Approximate Value Iteration

Economic Incentives to Increase Security in the Internet: the Case for Insurance Marc Lelarge (INRIA-ENS) Jean Bolot (SPRINT) IEEE INFOCOM, Rio 2009

Approximate String Matching

Approximate Dynamic Programming via Linear Programmingpapers.nips.cc/paper/2129-approximate-dynamic-programming-via... · Approximate Dynamic Programming via Linear Programming

Calculating approximate GCD of univariate polynomials with 'approximate' syzygies

A Spectral Algorithm with Additive Clustering for the ...chercheurs.lille.inria.fr/ekaufman/ALTKaufmann.pdf · (Telecom ParisTech) and Marc Lelarge (Inria Paris) ALT conference, Bari

Diffusion and Cascading Behavior in Random Networks Marc Lelarge (INRIA-ENS) Columbia University Joint CS/EE Networking Seminar June 2, 2011

Approximate Computing Is Dead; Long Live Approximate Computingasampson/media/closedproblems-nope2017... · Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson

Approximate Tree Kernels