1 Analytic Solution of Hierarchical Variational Bayes Approach in Linear Inverse Problem Shinichi Nakajima, Sumio Watanabe Nikon Corporation Tokyo Institute

1

Analytic Solution of Hierarchical Variational Bayes Approach in L

inear Inverse Problem

Shinichi Nakajima, Sumio Watanabe　

Nikon Corporation

Tokyo Institute of Technology

2

Contents

Introduction Linear inverse problem Hierarchical variational Bayes [Sato et al.04] James-Stein estimator Purpose

Theoretical analysis Setting Solution

Discussion Conclusions

3

: parameter to be estimated

: constant matrix

: magnetic field detected by N detectors

: electric current at M sites

: lead field matrix

Linear inverse problem

aVy

MRa

Linear inverse problem

Ill-posed !MN

V

NRy

Example : Magnetoencephalography (MEG)

: observable

: noise: observation noise

4

Methods for ill-posed problem

2exp

2aVy

ay|pModel :

2exp|

22 aBa

Bat

Prior :

2||argmax

Baaypa

a , where B-2 is constant.

2. Maximum A posterior (MAP)

B-2 is also a parameter to be estimated!

3. Hierarchical Bayes

1, 2 : similar.

3 : very different from 1, 2.

1. Minimum norm maximum likelihood

yVVVaypaaa tt

aa

|argmax;argmin

2

5

Hierarchical Bayes

2exp

2aVy

ay|p

2exp||

22 aBa

Bat

Model :

Prior :

M

m mmmmBB1 00

22 ,; hyperprior :

Why ?

Estimate from observation, introducing2B

2BIf estimate and by Bayesian methods, many small elements become zero. (relevance determination)

a

singularities, hierarchy

a.k.a. Automatic Relevance Determination (ARD) [Mackay94,Neal96]

See [9] if interested.

6

Hierarchical variational Bayes

But, Bayes estimation requires huge computational costs.

Apply VB [Sato et al.04].

Free energy:

Trial posterior: BArwr ,

)()( rnErSrF

wrwrrS log

wr

n wYpwn

rE |log1 where

BrArwr Restriction:

Variational method

Optimum = Bayes posterior

7

zzn

Kjs

2

21̂

James-Stein (JS) estimator

n

iizn

z1

1: ML estimator (arithmetic mean)　　　　

K

2

true mean

Domination of over : βα GG βα GG

for any true

for a certain true

K-dimensional mean estimation (Regular model)

nzz ,,1 : samples

3KML is efficient (never dominated by any unbiased estimator),but is inadmissible (dominated by biased estimator) when [Stein56].

James-Stein estimator [James&Stein61]

JS (K=3)

ML

A certain relation between EB and JSwas discussed in [Efron&Morris73]

shrinkage factor

8

We theoretically analyze the HVB and derive its solution, and discuss a relation between HVB and positive-part JS, focusing on simplified version of Sato’s approach.

Purpose[Sato et al.04] have derived simple iterative algorithm based on HVB in MEG application, and experimentally shown good performance.

zzn

znz

2

21;PJS

true)is (if 1

false) is (if 0

event

eventevent : degree of shrinkage

Positive part JS :

9

Contents




10

Setting

Consider time series data.

time u

time u

a’

b

2exp

2aVy

ay|p

2exp||

22 aBa

Bat

ARD　Model :

Prior :

adiagA

bdiagB

U

Use constant hyperparameter during U[Sato et al. 04]

11

Summary of settingNu Ry )(

Observable : Mu Ra )(Parameter :

MRbbB where, diagHyperparmeter (constant during U):

U

uM

M

m mu

mmu

Nuu IabVyNbayp

11

)()()()( ,1;,|

U

uMa

umM

u IcaNa1

)()( ,0;

MbM IcbNb ,0;

Model :

priors:

where ,;yNd : d-dimensional normal

dI : identity matrix tm 001001

MvvV 1Constant matrix： n : # of samples

m-th element

12

Variational condition

Restriction:

M

mmm

u brarbar1

)( ~,

marwr

nmm wYpaar ~/

|logexp~~

)(

)1(

~U

m

m

m

a

a

a

mbrwr

nmm wYpbbr

/|logexp

Variational method

13

122 ;

~PJS~̂ˆ~̂

nO

v

U

v

jaba p

mm

mVBmmm

Theorem 1: The VB estimator of m-th element is given by

Theorem 1

where

mm

mt

mVBu

mmut

mu

m vvabyvj )()()( ˆˆ

n

i

ui

u yn

y1

)()( 1

HVB solution is similar to positive-part JS estimator with degree of shrinkage proportional to U.

Not explicit!

14

Contents




15

)()(ˆˆ utMN

umm jVVab

Proposition

Simply use positive-part JS estimator :

2;~̂ˆPJS

m

MNmmv

Uab

where

Only requires calculation of Moore-Penrose inverse. (HVB needs iterative calculation.)

16

Difference between VB and JS

asymptotically equivalent.

- When s are orthogonal, mv

12;~̂ˆPJS~̂ˆ

nO

v

Uabab p

m

MNmmVBmm

- When all s are parallel or orthogonal, mv

- Otherwise, future work.

JS suppresses overfitting more than HVB. (ehhances relevant determination.)

12;~̂ˆPJS~̂ˆ

nO

v

Uabab p

m

MNmmVBmm

MvvV 1 aVy

17

Contents




18

U

Conclusions HVB provides similar result to JS estimation in linear inverse

problem. Time duration U affects learning.

(large U enhances relevance determination． ) Future work

Difference from JS. Bounds of Generalization Error.

Conclusions & future work

time u time u

a’ b

19

Thank you!

Documents

1 Analytic Solution of Hierarchical Variational Bayes Approach in Linear Inverse Problem Shinichi Nakajima, Sumio Watanabe Nikon Corporation Tokyo Institute