Approximate Message Passinggpapan/pos12/talks/Montanari.pdf · Approximate Message Passing Mohsen...

Preview:

Citation preview

Approximate Message Passing

Mohsen Bayati, David Donoho, Adel JavanmardIain Johnstone, Marc Lelarge, Arian Maleki, Andrea Montanari

Stanford University

December 8, 2012

Andrea Montanari (Stanford) AMP December 8, 2012 1 / 68

Statistical estimation

y = f (�;noise)

� ! Unknown objecty ! Observations

f ( � ;noise) ! Parametric model

Problem: Estimate � from observations y .

Andrea Montanari (Stanford) AMP December 8, 2012 2 / 68

Statistical estimation

y = f (�;noise)

� ! Unknown objecty ! Observations

f ( � ;noise) ! Parametric model

Problem: Estimate � from observations y .

Andrea Montanari (Stanford) AMP December 8, 2012 2 / 68

A broad convergence

I Statistics

[Genomics, . . . ]

I Data mining

[Collaborative �ltering, Predictive analytics, . . . ]

I Signal processing

[Compressive sampling, . . . ]

I Inverse problems

[Medical imaging, Seismographic imaging,. . . ]

+Data, + Computation, Exploit hidden structure

Andrea Montanari (Stanford) AMP December 8, 2012 3 / 68

How should we think about these problems?

Andrea Montanari (Stanford) AMP December 8, 2012 4 / 68

How should we think about these problems?

Optimization?

maximize Likelihood(�jy)� Complexity(�)

`Separation principle'

I Modeler/statistician proposes convex cost function.

I Optimization expert proposes simple iterative algorithm.

I Run for 20 iterations and hope for the best.

Andrea Montanari (Stanford) AMP December 8, 2012 5 / 68

How should we think about these problems?

Beyond separation?

y ! �̂1 ! �̂2 ! �̂3 ! : : :

I Constrained complexity per iteration

I Fixed number of iterations (say 20)

I What is minimum MSE achievable?

Andrea Montanari (Stanford) AMP December 8, 2012 6 / 68

Outline

I A long example (algorithm + heuristics)

I A list of theorems/pointers

Andrea Montanari (Stanford) AMP December 8, 2012 7 / 68

A long example

Andrea Montanari (Stanford) AMP December 8, 2012 8 / 68

What type of example?

I Image processing (because they make nice �gures)

I Compressed sensing (simple)

WILL NOT MENTION SPARSITY!

Andrea Montanari (Stanford) AMP December 8, 2012 9 / 68

What type of example?

I Image processing (because they make nice �gures)

I Compressed sensing (simple)

WILL NOT MENTION SPARSITY!

Andrea Montanari (Stanford) AMP December 8, 2012 9 / 68

What type of example?

I Image processing (because they make nice �gures)

I Compressed sensing (simple)

WILL NOT MENTION SPARSITY!

Andrea Montanari (Stanford) AMP December 8, 2012 9 / 68

An image

� = 2 Cn

Unknown object (n = 5122 � 2:5 � 105)

Andrea Montanari (Stanford) AMP December 8, 2012 10 / 68

Noiseless linear measurements

y = A� = A�

Want to reconstruct �

Andrea Montanari (Stanford) AMP December 8, 2012 11 / 68

Noiseless linear measurements

y = A� = A�

Want to reconstruct �

Andrea Montanari (Stanford) AMP December 8, 2012 11 / 68

Measurement structure

A = eFReF = subsampled Fourier matrix

R =

26666664+1

�1�1

+1+1

�1

37777775 = random modulation

! y 2 Cm , m = 0:17n

Andrea Montanari (Stanford) AMP December 8, 2012 12 / 68

An approach popular in this community

y = A� + z ; z � N(0; �2Im�m)

p�jY (�jy) / expn� 1

2�2ky �A�k22

op�(�)

/mYa=1

expn� 1

2�2

�ya �ATa �

�2o nYi=1

p�i(�i )

Andrea Montanari (Stanford) AMP December 8, 2012 13 / 68

An approach popular in this community

y = A� + z ; z � N(0; �2Im�m)

p�jY (�jy) / expn� 1

2�2ky �A�k22

op�(�)

/mYa=1

expn� 1

2�2

�ya �ATa �

�2o nYi=1

p�i(�i )

Andrea Montanari (Stanford) AMP December 8, 2012 13 / 68

An approach popular in this community

y = A� + z ; z � N(0; �2Im�m)

p�jY (�jy) / expn� 1

2�2ky �A�k22

op�(�)

/mYa=1

expn� 1

2�2

�ya �ATa �

�2o nYi=1

p�i(�i )

Andrea Montanari (Stanford) AMP December 8, 2012 13 / 68

Factor graph!

1 a m

1 i n

p�jY (�jy) /mYa=1

expn� 1

2�2

�ya �ATa �

�2o nYi=1

p�i(�i )

Use BP!

Andrea Montanari (Stanford) AMP December 8, 2012 14 / 68

Many issues

I Anyone knows the prior distribution of natural images?

I Computation per iteration, memory �(mn).

I Very loopy graph.

I . . .

Let us try something simpler!

Andrea Montanari (Stanford) AMP December 8, 2012 15 / 68

Many issues

I Anyone knows the prior distribution of natural images?

I Computation per iteration, memory �(mn).

I Very loopy graph.

I . . .

Let us try something simpler!

Andrea Montanari (Stanford) AMP December 8, 2012 15 / 68

Constructing a �rst estimate

y = A�

Matched �lter

�̂1 =1

mAyy

Andrea Montanari (Stanford) AMP December 8, 2012 16 / 68

How good is this?

E �̂1 = (one line calculation) = �

Andrea Montanari (Stanford) AMP December 8, 2012 17 / 68

Check it out

�̂1 = Ayy = � =

Does not look that good!

Andrea Montanari (Stanford) AMP December 8, 2012 18 / 68

Check it out

�̂1 = Ayy = � =

Does not look that good!

Andrea Montanari (Stanford) AMP December 8, 2012 18 / 68

Idea

= + `noise'

Andrea Montanari (Stanford) AMP December 8, 2012 19 / 68

Idea

= +

Andrea Montanari (Stanford) AMP December 8, 2012 20 / 68

How big is the `noise'?

Efk�̂1 � �k22g = (two lines calculation) =1� �

�k�k22

Andrea Montanari (Stanford) AMP December 8, 2012 21 / 68

Matched �lter blows up noise

MSEout =1� �

�MSEin

Andrea Montanari (Stanford) AMP December 8, 2012 22 / 68

Let's check

0

1

2

3

4

5

6

7

8

9

0 0.5 1 1.5 2 2.5 3 3.5 4

MSEin

MSEout

Andrea Montanari (Stanford) AMP December 8, 2012 23 / 68

Denoising

�̂1 � � + � z ; zi � N(0; 1)

Idea: Treat �̂1 as e�ective observations in denoising

Andrea Montanari (Stanford) AMP December 8, 2012 24 / 68

Denoising

�̂1 � � + � z ; zi � N(0; 1)

Idea: Treat �̂1 as e�ective observations in denoising

Andrea Montanari (Stanford) AMP December 8, 2012 24 / 68

Denoising by nonlocal means

by = � + � z ;

�̂i =

Pj W (i ; j )byjPj W (i ; j )

;

W (i ; j ) =

(1 if kPatch(i ; by)� Patch(j ; by)k22 � � �2;

0 otherwise

[Buades, Coll, Morel, 2005]

�̂ � �(by)Andrea Montanari (Stanford) AMP December 8, 2012 25 / 68

Denoising by nonlocal means

by = � + � z ;

�̂i =

Pj W (i ; j )byjPj W (i ; j )

;

W (i ; j ) =

(1 if kPatch(i ; by)� Patch(j ; by)k22 � � �2;

0 otherwise

[Buades, Coll, Morel, 2005]

�̂ � �(by)Andrea Montanari (Stanford) AMP December 8, 2012 25 / 68

Denoising by nonlocal means

by = � + � z ;

�̂i =

Pj W (i ; j )byjPj W (i ; j )

;

W (i ; j ) =

(1 if kPatch(i ; by)� Patch(j ; by)k22 � � �2;

0 otherwise

[Buades, Coll, Morel, 2005]

�̂ � �(by)Andrea Montanari (Stanford) AMP December 8, 2012 25 / 68

Patches

i

j

Andrea Montanari (Stanford) AMP December 8, 2012 26 / 68

Will it work?

�̂2 = �(�̂1) = �(Ayy) = �� �

Andrea Montanari (Stanford) AMP December 8, 2012 27 / 68

Let's try

�̂1 = Ayy = �̂2 = �(Ayy) =

Better than garbage!

Andrea Montanari (Stanford) AMP December 8, 2012 28 / 68

How much better?

0

0.02

0.04

0.06

0.08

0.1

0 1 2 3 4 5 6 7 8 9

MSEin

MSEout

Andrea Montanari (Stanford) AMP December 8, 2012 29 / 68

How much better?

0

0.02

0.04

0.06

0.08

0.1

0 1 2 3 4 5 6 7 8 9

MSEin

MSEout

c1 x

c2px

?

Andrea Montanari (Stanford) AMP December 8, 2012 30 / 68

Let us repeat the denoising experiment

Andrea Montanari (Stanford) AMP December 8, 2012 31 / 68

Let us repeat the denoising experiment: y = � + � z

y

�(y)

� = 1 � = 0:5 � = 0:25 � = 0:12

Andrea Montanari (Stanford) AMP December 8, 2012 32 / 68

Quantitatively

0

0.02

0.04

0.06

0.08

0.1

0 1 2 3 4 5 6 7 8 9

MSEin

MSEout

c1 x

c2px

Andrea Montanari (Stanford) AMP December 8, 2012 33 / 68

Quantitatively

1e-05

0.0001

0.001

0.01

0.1

0.001 0.01 0.1 1 10

MSEin

MSEout

c1 x

c2px

Andrea Montanari (Stanford) AMP December 8, 2012 34 / 68

Approximate denoiser characterization

MSEout = cpMSEin

(enough for our purposes)

(see also Maleki, Baraniuk, Narayan, 2012)

(Arias-Castro, Willett, 2012)

Andrea Montanari (Stanford) AMP December 8, 2012 35 / 68

What we achieved so far

0

1

2

3

4

5

6

7

8

9

0 0.5 1 1.5 2 2.5 3 3.5 4 0

0.02

0.04

0.06

0.08

0.1

0 1 2 3 4 5 6 7 8 9

MSEin

MSEout

MSEin

MSEout

DenoiserMatched �lter

Andrea Montanari (Stanford) AMP December 8, 2012 36 / 68

What we achieved so far

0 1 2 3 4 5 6 7 8 9 0

0.5

1

1.5

2

2.5

3

3.5

4

0

0.02

0.04

0.06

0.08

0.1

0 1 2 3 4 5 6 7 8 9

MSEin

MSEout

MSEout

MSEin

DenoiserMatched �lter

Andrea Montanari (Stanford) AMP December 8, 2012 37 / 68

What we achieved so far

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 1 2 3 4 5 6 7 8 9

MSEold

MSEnew

Andrea Montanari (Stanford) AMP December 8, 2012 38 / 68

What we achieved so far

0.0001

0.001

0.01

0.1

1

10

100

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

What about iterating?

Andrea Montanari (Stanford) AMP December 8, 2012 39 / 68

What we achieved so far

0.0001

0.001

0.01

0.1

1

10

100

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

What about iterating?

Andrea Montanari (Stanford) AMP December 8, 2012 39 / 68

How do we iterate?

Will tell you later!

Andrea Montanari (Stanford) AMP December 8, 2012 40 / 68

t = 1

�̂1 =

Andrea Montanari (Stanford) AMP December 8, 2012 41 / 68

t = 2

�̂2 =

Andrea Montanari (Stanford) AMP December 8, 2012 42 / 68

t = 3

�̂3 =

Andrea Montanari (Stanford) AMP December 8, 2012 43 / 68

t = 3

0.0001

0.001

0.01

0.1

1

10

100

0.001 0.01 0.1 1 10 100

MSEold

MSEnew Non obvious!

Andrea Montanari (Stanford) AMP December 8, 2012 44 / 68

t = 4

�̂4 =

Andrea Montanari (Stanford) AMP December 8, 2012 45 / 68

t = 4

0.0001

0.001

0.01

0.1

1

10

100

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

Andrea Montanari (Stanford) AMP December 8, 2012 46 / 68

t = 5

�̂5 =

Andrea Montanari (Stanford) AMP December 8, 2012 47 / 68

t = 5

0.0001

0.001

0.01

0.1

1

10

100

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

Andrea Montanari (Stanford) AMP December 8, 2012 48 / 68

t = 6

�̂6 =

Andrea Montanari (Stanford) AMP December 8, 2012 49 / 68

t = 6

0.0001

0.001

0.01

0.1

1

10

100

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

Andrea Montanari (Stanford) AMP December 8, 2012 50 / 68

t = 7

�̂7 =

Andrea Montanari (Stanford) AMP December 8, 2012 51 / 68

t = 7

0.0001

0.001

0.01

0.1

1

10

100

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

Andrea Montanari (Stanford) AMP December 8, 2012 52 / 68

t = 8

�̂8 =

Andrea Montanari (Stanford) AMP December 8, 2012 53 / 68

t = 8

0.0001

0.001

0.01

0.1

1

10

100

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

Andrea Montanari (Stanford) AMP December 8, 2012 54 / 68

t = 9

�̂9 =

Andrea Montanari (Stanford) AMP December 8, 2012 55 / 68

t = 9

0.0001

0.001

0.01

0.1

1

10

100

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

Andrea Montanari (Stanford) AMP December 8, 2012 56 / 68

t = 0; 1; 2; 3; : : : ; 20

Andrea Montanari (Stanford) AMP December 8, 2012 57 / 68

t = 0; 1; 2; 3; : : : ; 20

0.0001

0.001

0.01

0.1

1

10

100

0.001 0.01 0.1 1 10 100

MSEold

MSEnew

Andrea Montanari (Stanford) AMP December 8, 2012 58 / 68

How do we iterate?

Andrea Montanari (Stanford) AMP December 8, 2012 59 / 68

Approximate Message Passing (AMP)

�̂2t = �(�̂2t�1)

�̂2t+1 = �̂2t +Ay r t

r t = y �A�̂2t + bt rt�1

bt =1

mdiv�(�̂2t�1)

(can be computed explicitly)

[Thouless, Anderson, Palmer, 1977, Kabashima, 2003,

Donoho, Maleki, Montanari, 2009, Donoho, Johnstone, Montanari, 2012]

Andrea Montanari (Stanford) AMP December 8, 2012 60 / 68

Connection with Belief Propagation

mi!j = mi + "i!j

Linearize in "i!j

Very di�erent from naive mean �eld!

Andrea Montanari (Stanford) AMP December 8, 2012 61 / 68

Connection with Belief Propagation

mi!j = mi + "i!j

Linearize in "i!j

Very di�erent from naive mean �eld!

Andrea Montanari (Stanford) AMP December 8, 2012 61 / 68

Connection with Perturbation, Optimization,Statistics?

I Robustness wrt pX 2DistributionClass

(� = minimax denoider in DistributionClass)

I Can rigorously track evolution over A random(What about A = Adet + �Arand?)

Andrea Montanari (Stanford) AMP December 8, 2012 62 / 68

Connection with Perturbation, Optimization,Statistics?

I Robustness wrt pX 2DistributionClass

(� = minimax denoider in DistributionClass)

I Can rigorously track evolution over A random(What about A = Adet + �Arand?)

Andrea Montanari (Stanford) AMP December 8, 2012 62 / 68

A list of theorems/pointers

Andrea Montanari (Stanford) AMP December 8, 2012 63 / 68

A list of theorems/pointers

I Connection with optimization [Bayati, Montanari, 2011]

I Minimax theory for sparse/block-sparse/TV[Donoho, Johnstone, Montanari, 2012]

I Bayesian reconstruction up to information dimension[Donoho, Javanmard, Montanari, 2012]

I Universality [Bayati, Lelarge, Montanari, 2012]

I Analysis of Generalized AMP [Javanmard, Montanari, 2012]

I Application to sparse PCA [In preparation, 2013]

Andrea Montanari (Stanford) AMP December 8, 2012 64 / 68

Related work

I (Non-rigorous) replica method.[Tanaka 2002, Guo, Verdú 2005, Kabashima, Tanaka 2009, Rangan,

Fletcher, Goyal 2009, Caire, Tulino, Shamai, Verdú 2012. . . ]

I Alternative argument for robust regression (e.g. min� ky �A�k1)[Bean, Bickel, El Karoui, Lim, Yu 2012]

I Generalized linear models [Rangan 2011]

I Graphical model priors [Schniter et al. 2010-. . . ]

I Low-rank matrices [Rangan, Fletcher 2012]

Andrea Montanari (Stanford) AMP December 8, 2012 65 / 68

Conclusion

Andrea Montanari (Stanford) AMP December 8, 2012 66 / 68

Conclusion

I Can do message passing/BP without Bayesian assumptions!

I There is something between naive mean �eld and BP

Thanks!

Andrea Montanari (Stanford) AMP December 8, 2012 67 / 68

Noise distribution?

error

Fre

quen

cy

−5 0 5

050

0010

000

1500

020

000

2500

030

000

Andrea Montanari (Stanford) AMP December 8, 2012 68 / 68

Recommended