Upload
jonah-williams
View
213
Download
0
Embed Size (px)
Citation preview
1
Analytic Solution of Hierarchical Variational Bayes Approach in L
inear Inverse Problem
Shinichi Nakajima, Sumio Watanabe
Nikon Corporation
Tokyo Institute of Technology
2
Contents
Introduction Linear inverse problem Hierarchical variational Bayes [Sato et al.04] James-Stein estimator Purpose
Theoretical analysis Setting Solution
Discussion Conclusions
3
: parameter to be estimated
: constant matrix
: magnetic field detected by N detectors
: electric current at M sites
: lead field matrix
Linear inverse problem
aVy
MRa
Linear inverse problem
Ill-posed !MN
V
NRy
Example : Magnetoencephalography (MEG)
: observable
: noise: observation noise
4
Methods for ill-posed problem
2exp
2aVy
ay|pModel :
2exp|
22 aBa
Bat
Prior :
2||argmax
Baaypa
a , where B-2 is constant.
2. Maximum A posterior (MAP)
B-2 is also a parameter to be estimated!
3. Hierarchical Bayes
1, 2 : similar.
3 : very different from 1, 2.
1. Minimum norm maximum likelihood
yVVVaypaaa tt
aa
|argmax;argmin
2
5
Hierarchical Bayes
2exp
2aVy
ay|p
2exp||
22 aBa
Bat
Model :
Prior :
M
m mmmmBB1 00
22 ,; hyperprior :
Why ?
Estimate from observation, introducing2B
2BIf estimate and by Bayesian methods, many small elements become zero. (relevance determination)
a
singularities, hierarchy
a.k.a. Automatic Relevance Determination (ARD) [Mackay94,Neal96]
See [9] if interested.
6
Hierarchical variational Bayes
But, Bayes estimation requires huge computational costs.
Apply VB [Sato et al.04].
Free energy:
Trial posterior: BArwr ,
)()( rnErSrF
wrwrrS log
wr
n wYpwn
rE |log1 where
BrArwr Restriction:
Variational method
Optimum = Bayes posterior
7
zzn
Kjs
2
21̂
James-Stein (JS) estimator
n
iizn
z1
1: ML estimator (arithmetic mean)
K
2
true mean
Domination of over : βα GG βα GG
for any true
for a certain true
K-dimensional mean estimation (Regular model)
nzz ,,1 : samples
3KML is efficient (never dominated by any unbiased estimator),but is inadmissible (dominated by biased estimator) when [Stein56].
James-Stein estimator [James&Stein61]
JS (K=3)
ML
A certain relation between EB and JSwas discussed in [Efron&Morris73]
shrinkage factor
8
We theoretically analyze the HVB and derive its solution, and discuss a relation between HVB and positive-part JS, focusing on simplified version of Sato’s approach.
Purpose[Sato et al.04] have derived simple iterative algorithm based on HVB in MEG application, and experimentally shown good performance.
zzn
znz
2
21;PJS
true)is (if 1
false) is (if 0
event
eventevent : degree of shrinkage
Positive part JS :
9
Contents
Introduction Linear inverse problem Hierarchical variational Bayes [Sato et al.04] James-Stein estimator Purpose
Theoretical analysis Setting Solution
Discussion Conclusions
10
Setting
Consider time series data.
time u
time u
a’
b
2exp
2aVy
ay|p
2exp||
22 aBa
Bat
ARD Model :
Prior :
adiagA
bdiagB
U
Use constant hyperparameter during U[Sato et al. 04]
11
Summary of settingNu Ry )(
Observable : Mu Ra )(Parameter :
MRbbB where, diagHyperparmeter (constant during U):
U
uM
M
m mu
mmu
Nuu IabVyNbayp
11
)()()()( ,1;,|
U
uMa
umM
u IcaNa1
)()( ,0;
MbM IcbNb ,0;
Model :
priors:
where ,;yNd : d-dimensional normal
dI : identity matrix tm 001001
MvvV 1Constant matrix: n : # of samples
m-th element
12
Variational condition
Restriction:
M
mmm
u brarbar1
)( ~,
marwr
nmm wYpaar ~/
|logexp~~
)(
)1(
~U
m
m
m
a
a
a
mbrwr
nmm wYpbbr
/|logexp
Variational method
13
122 ;
~PJS~̂ˆ~̂
nO
v
U
v
jaba p
mm
mVBmmm
Theorem 1: The VB estimator of m-th element is given by
Theorem 1
where
mm
mt
mVBu
mmut
mu
m vvabyvj )()()( ˆˆ
n
i
ui
u yn
y1
)()( 1
HVB solution is similar to positive-part JS estimator with degree of shrinkage proportional to U.
Not explicit!
14
Contents
Introduction Linear inverse problem Hierarchical variational Bayes [Sato et al.04] James-Stein estimator Purpose
Theoretical analysis Setting Solution
Discussion Conclusions
15
)()(ˆˆ utMN
umm jVVab
Proposition
Simply use positive-part JS estimator :
2;~̂ˆPJS
m
MNmmv
Uab
where
Only requires calculation of Moore-Penrose inverse. (HVB needs iterative calculation.)
16
Difference between VB and JS
asymptotically equivalent.
- When s are orthogonal, mv
12;~̂ˆPJS~̂ˆ
nO
v
Uabab p
m
MNmmVBmm
- When all s are parallel or orthogonal, mv
- Otherwise, future work.
JS suppresses overfitting more than HVB. (ehhances relevant determination.)
12;~̂ˆPJS~̂ˆ
nO
v
Uabab p
m
MNmmVBmm
MvvV 1 aVy
17
Contents
Introduction Linear inverse problem Hierarchical variational Bayes [Sato et al.04] James-Stein estimator Purpose
Theoretical analysis Setting Solution
Discussion Conclusions
18
U
Conclusions HVB provides similar result to JS estimation in linear inverse
problem. Time duration U affects learning.
(large U enhances relevance determination. ) Future work
Difference from JS. Bounds of Generalization Error.
Conclusions & future work
time u time u
a’ b
19
Thank you!