View
8
Download
0
Category
Preview:
Citation preview
Approximate Message Passing
Mohsen Bayati, David Donoho, Adel JavanmardIain Johnstone, Marc Lelarge, Arian Maleki, Andrea Montanari
Stanford University
December 8, 2012
Andrea Montanari (Stanford) AMP December 8, 2012 1 / 68
Statistical estimation
y = f (�;noise)
� ! Unknown objecty ! Observations
f ( � ;noise) ! Parametric model
Problem: Estimate � from observations y .
Andrea Montanari (Stanford) AMP December 8, 2012 2 / 68
Statistical estimation
y = f (�;noise)
� ! Unknown objecty ! Observations
f ( � ;noise) ! Parametric model
Problem: Estimate � from observations y .
Andrea Montanari (Stanford) AMP December 8, 2012 2 / 68
A broad convergence
I Statistics
[Genomics, . . . ]
I Data mining
[Collaborative �ltering, Predictive analytics, . . . ]
I Signal processing
[Compressive sampling, . . . ]
I Inverse problems
[Medical imaging, Seismographic imaging,. . . ]
+Data, + Computation, Exploit hidden structure
Andrea Montanari (Stanford) AMP December 8, 2012 3 / 68
How should we think about these problems?
Andrea Montanari (Stanford) AMP December 8, 2012 4 / 68
How should we think about these problems?
Optimization?
maximize Likelihood(�jy)� Complexity(�)
`Separation principle'
I Modeler/statistician proposes convex cost function.
I Optimization expert proposes simple iterative algorithm.
I Run for 20 iterations and hope for the best.
Andrea Montanari (Stanford) AMP December 8, 2012 5 / 68
How should we think about these problems?
Beyond separation?
y ! �̂1 ! �̂2 ! �̂3 ! : : :
I Constrained complexity per iteration
I Fixed number of iterations (say 20)
I What is minimum MSE achievable?
Andrea Montanari (Stanford) AMP December 8, 2012 6 / 68
Outline
I A long example (algorithm + heuristics)
I A list of theorems/pointers
Andrea Montanari (Stanford) AMP December 8, 2012 7 / 68
A long example
Andrea Montanari (Stanford) AMP December 8, 2012 8 / 68
What type of example?
I Image processing (because they make nice �gures)
I Compressed sensing (simple)
WILL NOT MENTION SPARSITY!
Andrea Montanari (Stanford) AMP December 8, 2012 9 / 68
What type of example?
I Image processing (because they make nice �gures)
I Compressed sensing (simple)
WILL NOT MENTION SPARSITY!
Andrea Montanari (Stanford) AMP December 8, 2012 9 / 68
What type of example?
I Image processing (because they make nice �gures)
I Compressed sensing (simple)
WILL NOT MENTION SPARSITY!
Andrea Montanari (Stanford) AMP December 8, 2012 9 / 68
An image
� = 2 Cn
Unknown object (n = 5122 � 2:5 � 105)
Andrea Montanari (Stanford) AMP December 8, 2012 10 / 68
Noiseless linear measurements
y = A� = A�
Want to reconstruct �
Andrea Montanari (Stanford) AMP December 8, 2012 11 / 68
Noiseless linear measurements
y = A� = A�
Want to reconstruct �
Andrea Montanari (Stanford) AMP December 8, 2012 11 / 68
Measurement structure
A = eFReF = subsampled Fourier matrix
R =
26666664+1
�1�1
+1+1
�1
37777775 = random modulation
! y 2 Cm , m = 0:17n
Andrea Montanari (Stanford) AMP December 8, 2012 12 / 68
An approach popular in this community
y = A� + z ; z � N(0; �2Im�m)
p�jY (�jy) / expn� 1
2�2ky �A�k22
op�(�)
/mYa=1
expn� 1
2�2
�ya �ATa �
�2o nYi=1
p�i(�i )
Andrea Montanari (Stanford) AMP December 8, 2012 13 / 68
An approach popular in this community
y = A� + z ; z � N(0; �2Im�m)
p�jY (�jy) / expn� 1
2�2ky �A�k22
op�(�)
/mYa=1
expn� 1
2�2
�ya �ATa �
�2o nYi=1
p�i(�i )
Andrea Montanari (Stanford) AMP December 8, 2012 13 / 68
An approach popular in this community
y = A� + z ; z � N(0; �2Im�m)
p�jY (�jy) / expn� 1
2�2ky �A�k22
op�(�)
/mYa=1
expn� 1
2�2
�ya �ATa �
�2o nYi=1
p�i(�i )
Andrea Montanari (Stanford) AMP December 8, 2012 13 / 68
Factor graph!
1 a m
1 i n
p�jY (�jy) /mYa=1
expn� 1
2�2
�ya �ATa �
�2o nYi=1
p�i(�i )
Use BP!
Andrea Montanari (Stanford) AMP December 8, 2012 14 / 68
Many issues
I Anyone knows the prior distribution of natural images?
I Computation per iteration, memory �(mn).
I Very loopy graph.
I . . .
Let us try something simpler!
Andrea Montanari (Stanford) AMP December 8, 2012 15 / 68
Many issues
I Anyone knows the prior distribution of natural images?
I Computation per iteration, memory �(mn).
I Very loopy graph.
I . . .
Let us try something simpler!
Andrea Montanari (Stanford) AMP December 8, 2012 15 / 68
Constructing a �rst estimate
y = A�
Matched �lter
�̂1 =1
mAyy
Andrea Montanari (Stanford) AMP December 8, 2012 16 / 68
How good is this?
E �̂1 = (one line calculation) = �
Andrea Montanari (Stanford) AMP December 8, 2012 17 / 68
Check it out
�̂1 = Ayy = � =
Does not look that good!
Andrea Montanari (Stanford) AMP December 8, 2012 18 / 68
Check it out
�̂1 = Ayy = � =
Does not look that good!
Andrea Montanari (Stanford) AMP December 8, 2012 18 / 68
Idea
= + `noise'
Andrea Montanari (Stanford) AMP December 8, 2012 19 / 68
Idea
= +
Andrea Montanari (Stanford) AMP December 8, 2012 20 / 68
How big is the `noise'?
Efk�̂1 � �k22g = (two lines calculation) =1� �
�k�k22
Andrea Montanari (Stanford) AMP December 8, 2012 21 / 68
Matched �lter blows up noise
MSEout =1� �
�MSEin
Andrea Montanari (Stanford) AMP December 8, 2012 22 / 68
Let's check
0
1
2
3
4
5
6
7
8
9
0 0.5 1 1.5 2 2.5 3 3.5 4
MSEin
MSEout
Andrea Montanari (Stanford) AMP December 8, 2012 23 / 68
Denoising
�̂1 � � + � z ; zi � N(0; 1)
Idea: Treat �̂1 as e�ective observations in denoising
Andrea Montanari (Stanford) AMP December 8, 2012 24 / 68
Denoising
�̂1 � � + � z ; zi � N(0; 1)
Idea: Treat �̂1 as e�ective observations in denoising
Andrea Montanari (Stanford) AMP December 8, 2012 24 / 68
Denoising by nonlocal means
by = � + � z ;
�̂i =
Pj W (i ; j )byjPj W (i ; j )
;
W (i ; j ) =
(1 if kPatch(i ; by)� Patch(j ; by)k22 � � �2;
0 otherwise
[Buades, Coll, Morel, 2005]
�̂ � �(by)Andrea Montanari (Stanford) AMP December 8, 2012 25 / 68
Denoising by nonlocal means
by = � + � z ;
�̂i =
Pj W (i ; j )byjPj W (i ; j )
;
W (i ; j ) =
(1 if kPatch(i ; by)� Patch(j ; by)k22 � � �2;
0 otherwise
[Buades, Coll, Morel, 2005]
�̂ � �(by)Andrea Montanari (Stanford) AMP December 8, 2012 25 / 68
Denoising by nonlocal means
by = � + � z ;
�̂i =
Pj W (i ; j )byjPj W (i ; j )
;
W (i ; j ) =
(1 if kPatch(i ; by)� Patch(j ; by)k22 � � �2;
0 otherwise
[Buades, Coll, Morel, 2005]
�̂ � �(by)Andrea Montanari (Stanford) AMP December 8, 2012 25 / 68
Patches
i
j
Andrea Montanari (Stanford) AMP December 8, 2012 26 / 68
Will it work?
�̂2 = �(�̂1) = �(Ayy) = �� �
Andrea Montanari (Stanford) AMP December 8, 2012 27 / 68
Let's try
�̂1 = Ayy = �̂2 = �(Ayy) =
Better than garbage!
Andrea Montanari (Stanford) AMP December 8, 2012 28 / 68
How much better?
0
0.02
0.04
0.06
0.08
0.1
0 1 2 3 4 5 6 7 8 9
MSEin
MSEout
Andrea Montanari (Stanford) AMP December 8, 2012 29 / 68
How much better?
0
0.02
0.04
0.06
0.08
0.1
0 1 2 3 4 5 6 7 8 9
MSEin
MSEout
c1 x
c2px
?
Andrea Montanari (Stanford) AMP December 8, 2012 30 / 68
Let us repeat the denoising experiment
Andrea Montanari (Stanford) AMP December 8, 2012 31 / 68
Let us repeat the denoising experiment: y = � + � z
y
�(y)
� = 1 � = 0:5 � = 0:25 � = 0:12
Andrea Montanari (Stanford) AMP December 8, 2012 32 / 68
Quantitatively
0
0.02
0.04
0.06
0.08
0.1
0 1 2 3 4 5 6 7 8 9
MSEin
MSEout
c1 x
c2px
Andrea Montanari (Stanford) AMP December 8, 2012 33 / 68
Quantitatively
1e-05
0.0001
0.001
0.01
0.1
0.001 0.01 0.1 1 10
MSEin
MSEout
c1 x
c2px
Andrea Montanari (Stanford) AMP December 8, 2012 34 / 68
Approximate denoiser characterization
MSEout = cpMSEin
(enough for our purposes)
(see also Maleki, Baraniuk, Narayan, 2012)
(Arias-Castro, Willett, 2012)
Andrea Montanari (Stanford) AMP December 8, 2012 35 / 68
What we achieved so far
0
1
2
3
4
5
6
7
8
9
0 0.5 1 1.5 2 2.5 3 3.5 4 0
0.02
0.04
0.06
0.08
0.1
0 1 2 3 4 5 6 7 8 9
MSEin
MSEout
MSEin
MSEout
DenoiserMatched �lter
Andrea Montanari (Stanford) AMP December 8, 2012 36 / 68
What we achieved so far
0 1 2 3 4 5 6 7 8 9 0
0.5
1
1.5
2
2.5
3
3.5
4
0
0.02
0.04
0.06
0.08
0.1
0 1 2 3 4 5 6 7 8 9
MSEin
MSEout
MSEout
MSEin
DenoiserMatched �lter
Andrea Montanari (Stanford) AMP December 8, 2012 37 / 68
What we achieved so far
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 1 2 3 4 5 6 7 8 9
MSEold
MSEnew
Andrea Montanari (Stanford) AMP December 8, 2012 38 / 68
What we achieved so far
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
What about iterating?
Andrea Montanari (Stanford) AMP December 8, 2012 39 / 68
What we achieved so far
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
What about iterating?
Andrea Montanari (Stanford) AMP December 8, 2012 39 / 68
How do we iterate?
Will tell you later!
Andrea Montanari (Stanford) AMP December 8, 2012 40 / 68
t = 1
�̂1 =
Andrea Montanari (Stanford) AMP December 8, 2012 41 / 68
t = 2
�̂2 =
Andrea Montanari (Stanford) AMP December 8, 2012 42 / 68
t = 3
�̂3 =
Andrea Montanari (Stanford) AMP December 8, 2012 43 / 68
t = 3
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew Non obvious!
Andrea Montanari (Stanford) AMP December 8, 2012 44 / 68
t = 4
�̂4 =
Andrea Montanari (Stanford) AMP December 8, 2012 45 / 68
t = 4
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
Andrea Montanari (Stanford) AMP December 8, 2012 46 / 68
t = 5
�̂5 =
Andrea Montanari (Stanford) AMP December 8, 2012 47 / 68
t = 5
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
Andrea Montanari (Stanford) AMP December 8, 2012 48 / 68
t = 6
�̂6 =
Andrea Montanari (Stanford) AMP December 8, 2012 49 / 68
t = 6
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
Andrea Montanari (Stanford) AMP December 8, 2012 50 / 68
t = 7
�̂7 =
Andrea Montanari (Stanford) AMP December 8, 2012 51 / 68
t = 7
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
Andrea Montanari (Stanford) AMP December 8, 2012 52 / 68
t = 8
�̂8 =
Andrea Montanari (Stanford) AMP December 8, 2012 53 / 68
t = 8
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
Andrea Montanari (Stanford) AMP December 8, 2012 54 / 68
t = 9
�̂9 =
Andrea Montanari (Stanford) AMP December 8, 2012 55 / 68
t = 9
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
Andrea Montanari (Stanford) AMP December 8, 2012 56 / 68
t = 0; 1; 2; 3; : : : ; 20
Andrea Montanari (Stanford) AMP December 8, 2012 57 / 68
t = 0; 1; 2; 3; : : : ; 20
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
Andrea Montanari (Stanford) AMP December 8, 2012 58 / 68
How do we iterate?
Andrea Montanari (Stanford) AMP December 8, 2012 59 / 68
Approximate Message Passing (AMP)
�̂2t = �(�̂2t�1)
�̂2t+1 = �̂2t +Ay r t
r t = y �A�̂2t + bt rt�1
bt =1
mdiv�(�̂2t�1)
(can be computed explicitly)
[Thouless, Anderson, Palmer, 1977, Kabashima, 2003,
Donoho, Maleki, Montanari, 2009, Donoho, Johnstone, Montanari, 2012]
Andrea Montanari (Stanford) AMP December 8, 2012 60 / 68
Connection with Belief Propagation
mi!j = mi + "i!j
Linearize in "i!j
Very di�erent from naive mean �eld!
Andrea Montanari (Stanford) AMP December 8, 2012 61 / 68
Connection with Belief Propagation
mi!j = mi + "i!j
Linearize in "i!j
Very di�erent from naive mean �eld!
Andrea Montanari (Stanford) AMP December 8, 2012 61 / 68
Connection with Perturbation, Optimization,Statistics?
I Robustness wrt pX 2DistributionClass
(� = minimax denoider in DistributionClass)
I Can rigorously track evolution over A random(What about A = Adet + �Arand?)
Andrea Montanari (Stanford) AMP December 8, 2012 62 / 68
Connection with Perturbation, Optimization,Statistics?
I Robustness wrt pX 2DistributionClass
(� = minimax denoider in DistributionClass)
I Can rigorously track evolution over A random(What about A = Adet + �Arand?)
Andrea Montanari (Stanford) AMP December 8, 2012 62 / 68
A list of theorems/pointers
Andrea Montanari (Stanford) AMP December 8, 2012 63 / 68
A list of theorems/pointers
I Connection with optimization [Bayati, Montanari, 2011]
I Minimax theory for sparse/block-sparse/TV[Donoho, Johnstone, Montanari, 2012]
I Bayesian reconstruction up to information dimension[Donoho, Javanmard, Montanari, 2012]
I Universality [Bayati, Lelarge, Montanari, 2012]
I Analysis of Generalized AMP [Javanmard, Montanari, 2012]
I Application to sparse PCA [In preparation, 2013]
Andrea Montanari (Stanford) AMP December 8, 2012 64 / 68
Related work
I (Non-rigorous) replica method.[Tanaka 2002, Guo, Verdú 2005, Kabashima, Tanaka 2009, Rangan,
Fletcher, Goyal 2009, Caire, Tulino, Shamai, Verdú 2012. . . ]
I Alternative argument for robust regression (e.g. min� ky �A�k1)[Bean, Bickel, El Karoui, Lim, Yu 2012]
I Generalized linear models [Rangan 2011]
I Graphical model priors [Schniter et al. 2010-. . . ]
I Low-rank matrices [Rangan, Fletcher 2012]
Andrea Montanari (Stanford) AMP December 8, 2012 65 / 68
Conclusion
Andrea Montanari (Stanford) AMP December 8, 2012 66 / 68
Conclusion
I Can do message passing/BP without Bayesian assumptions!
I There is something between naive mean �eld and BP
Thanks!
Andrea Montanari (Stanford) AMP December 8, 2012 67 / 68
Noise distribution?
error
Fre
quen
cy
−5 0 5
050
0010
000
1500
020
000
2500
030
000
Andrea Montanari (Stanford) AMP December 8, 2012 68 / 68
Recommended