94
Kernel models and penalized loss Bayesian kernel model Priors on measures Estimation and inference Results on data Open problems Finding religion: kernels and the Bayesian persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 7, 2007 Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Finding religion: kernels and the Bayesian

persuasion

Rev. Dr. Sayan Mukherjee

Department of Statistical ScienceInstitute for Genome Sciences & Policy

Department of Computer ScienceDuke University

May 7, 2007

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 2: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Not a Bayesian

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 3: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

What makes someone Bayesian

Is it Bayes rule ?

Prob(parameters|data) =Lik(data|paramaters) · π(parameters)

Prob(data).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 4: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

What makes someone Bayesian

Is it Bayes rule ?

Prob(parameters|data) =Lik(data|paramaters) · π(parameters)

Prob(data).

NO!!!!!!!!!!!!!!!!!!!!!! Necessary but no where near sufficient.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 5: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Why I am a Bayesian

Bayesian statistics is about embracing and formally modellinguncertainty.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 6: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

A simple example

I draw points x1, ..., xn from iid from a normal distribution and Iwant to know the mean and I know σ = 1.My likelihood and prior are

Lik(x1, ..., xn|µ) =n∏

i=1

1√2π

exp(−|xi − µ|2/2)

π(µ) =1√2π

exp(−|µ − 5|2/2).

The posterior can be computed closed form and it is a product ofnormals

p(µ|x1, ..., xn) =Lik(x1, ..., xn|µ)π(µ)

∫∞−∞ Lik(x1, ..., xn|µ)π(µ)dµ

.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 7: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Relevant papers

Characterizing the function space for Bayesian kernel models.Natesh Pillai, Qiang Wu, Feng Liang, Sayan Mukherjee,Robert L. Wolpert. Journal Machine Learning Research, inpress.

Understanding the use of unlabelled data in predictivemodelling. Feng Liang, Sayan Mukherjee, and Mike West.Statistical Science, in press.

Non-parametric Bayesian kernel models. Feng Liang, KaiMao, Ming Liao, Sayan Mukherjee and Mike West.Biometrika, submitted.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 8: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Table of contents

1 Kernel models and penalized loss2 Bayesian kernel model

Direct prior elicitationPriors and integral operators

3 Priors on measuresLevy processesGaussian processesBayesian representer theorem

4 Estimation and inferenceLikelihood and prior specificationVariable selectionSemi-supervised learning

5 Results on data6 Open problems

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 9: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Regression

data = {Li = (xi , yi )}ni=1 with Li

iid∼ ρ(X ,Y ).

X ∈ X ⊂ IRp and Y ⊂ IR and p � n.

A natural ideaf (x) = EY [Y |x ].

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 10: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

An excellent estimator

f (x) = arg minf∈bs

[error on data + smoothness of function]

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 11: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

An excellent estimator

f (x) = arg minf∈bs

[error on data + smoothness of function]

error on data = L(f , data) = (f (x) − y)2

smoothness of function = ‖f ‖2K =

|f ′(x)|2dx

big function space = reproducing kernel Hilbert space = HK

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 12: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

An excellent estimator

f (x) = arg minf∈HK

[

L(f , data) + λ‖f ‖2K

]

The kernel: K : X × X → IR e.g. K (u, v) = e (−‖u−v‖2).

The RKHS

HK =

{

f

∣f (x) =

i=1

αiK (x , xi ), xi ∈ X , αi ∈ R, ` ∈ N

}

.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 13: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Representer theorem

f (x) = arg minf∈HK

[

L(f , data) + λ‖f ‖2K

]

f (x) =n∑

i=1

aiK (x , xi ).

Great when p � n.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 14: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Very popular and useful

1 Support vector machines

f (x) = arg minf ∈HK

[

n∑

i=1

|1 − yi · f (xi)|+ + λ‖f ‖2K

]

,

2 Regularized Kernel regression

f (x) = arg minf∈HK

[

n∑

i=1

|yi − f (xi )|2 + λ‖f ‖2K

]

,

3 Regularized logistic regression

f (x) = arg minf∈HK

[

n∑

i=1

ln(

1 + e−yi ·f (xi ))

+ λ‖f ‖2K

]

.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 15: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Bayesian interpretation of RBF

yi = f (xi) + ε, εiid∼ No(0, σ2).

Lik(data|f ) ∝n∏

i=1

exp(−(yi − f (xi ))2/2σ2) π(f ) ∝ exp(−‖f ‖2

K ).

Prob(f |data) ∝ Lik(data|f ) · π(f ).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 16: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Bayesian interpretation of RBF

yi = f (xi) + ε, εiid∼ No(0, σ2).

Lik(data|f ) ∝n∏

i=1

exp(−(yi − f (xi ))2/2σ2) π(f ) ∝ exp(−‖f ‖2

K ).

Prob(f |data) ∝ Lik(data|f ) · π(f ).

Maximum a posteriori (MAP) estimator

f = arg maxf∈HK

Prob(f |data).

I want the full posterior.Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 17: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Priors via spectral expansion

HK =

{

f

∣f (x) =

∞∑

i=1

aiφi(x) with

∞∑

i=1

a2i /λi < ∞

}

,

φi (x) and λi ≥ 0 are eigenfunctions and eigenvalues of K :

λiφi (x) =

XK (x , u)φi (u)dγ(u).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 18: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Priors via spectral expansion

HK =

{

f

∣f (x) =

∞∑

i=1

aiφi(x) with

∞∑

i=1

a2i /λi < ∞

}

,

φi (x) and λi ≥ 0 are eigenfunctions and eigenvalues of K :

λiφi (x) =

XK (x , u)φi (u)dγ(u).

Specify a prior on HK via a prior on A

A ={

(

ak

)∞

k=1

k

a2k/λk < ∞

}

.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 19: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Priors via spectral expansion

HK =

{

f

∣f (x) =

∞∑

i=1

aiφi(x) with

∞∑

i=1

a2i /λi < ∞

}

,

φi (x) and λi ≥ 0 are eigenfunctions and eigenvalues of K :

λiφi (x) =

XK (x , u)φi (u)dγ(u).

Specify a prior on HK via a prior on A

A ={

(

ak

)∞

k=1

k

a2k/λk < ∞

}

.

Hard to sample and relies on computation of eigenvalues andeigenvectors.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 20: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Priors via duality

The duality between Gaussian processes and RKHS implies thefollowing construction

f (·) ∼ GP(µf ,K ),

where K is given by the kernel.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 21: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Priors via duality

The duality between Gaussian processes and RKHS implies thefollowing construction

f (·) ∼ GP(µf ,K ),

where K is given by the kernel.

f (·) /∈ HK almost surely.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 22: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Integral operators

Integral operator LK

: Γ → G

G =

{

f

∣f (x) := L

K[γ](x) =

XK (x , u) dγ(u), γ ∈ Γ

}

,

with Γ ⊆ B(X ).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 23: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Integral operators

Integral operator LK

: Γ → G

G =

{

f

∣f (x) := L

K[γ](x) =

XK (x , u) dγ(u), γ ∈ Γ

}

,

with Γ ⊆ B(X ).

A prior on Γ implies a prior on G.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 24: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Equivalence with RKHS

For what Γ is HK = span(G) ?

What is L−1K

(HK ) =??. This is hard to characterize.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 25: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Equivalence with RKHS

For what Γ is HK = span(G) ?

What is L−1K

(HK ) =??. This is hard to characterize.

The candidates for Γ will be

1 square integrable functions

2 integrable functions

3 discrete measures

4 the union or integrable functions and discrete measures.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 26: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Square integrable functions are too small

Proposition

For every γ ∈ L2(X ), LK [γ] ∈ HK . Consequently,

L2(X ) ⊂ L−1K (HK ).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 27: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Square integrable functions are too small

Proposition

For every γ ∈ L2(X ), LK [γ] ∈ HK . Consequently,

L2(X ) ⊂ L−1K (HK ).

Corollary

If Λ = {k : λk > 0} is a finite set, then LK (L2(X )) = HK

otherwise LK (L2(X )) $ HK . The latter occurs when the kernel K

is strictly positive definite, the RKHS is infinite-dimensional.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 28: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Signed measures are (almost) just right

Measures: The class of functions L1(X ) are signed measures.

Proposition

For every γ ∈ L1(X ), LK [γ] ∈ HK . Consequently,

L1(X ) ⊂ L−1K (HK ).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 29: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Signed measures are (almost) just right

Measures: The class of functions L1(X ) are signed measures.

Proposition

For every γ ∈ L1(X ), LK [γ] ∈ HK . Consequently,

L1(X ) ⊂ L−1K (HK ).

Discrete measures:

MD =

{

µ =

n∑

i=1

ciδxi:

n∑

i=1

|ci | < ∞, xi ∈ X , n ∈ N

}

.

Proposition

Given the set of finite discrete measures, MD ⊂ L−1K (HK ).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 30: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Signed measures are (almost) just right

Nonsingular measures: M = L1(X ) ∪MD

Proposition

LK (M) is dense in HK with respect to the RKHS norm.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 31: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

Signed measures are (almost) just right

Nonsingular measures: M = L1(X ) ∪MD

Proposition

LK (M) is dense in HK with respect to the RKHS norm.

Proposition

B(X ) $ L−1K (HK (X )).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 32: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Direct prior elicitationPriors and integral operators

The implication

Take home message – need priors on signed measures.

A function theoretic foundation for random signed measures suchas Gaussian, Dirichlet and Levy process priors.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 33: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Bayesian kernel model

yi = f (xi) + ε, εiid∼ No(0, σ2).

f (x) =

XK (x , u)Z (du)

where Z (du) ∈ M(X ) is a signed measure on X .

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 34: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Bayesian kernel model

yi = f (xi) + ε, εiid∼ No(0, σ2).

f (x) =

XK (x , u)Z (du)

where Z (du) ∈ M(X ) is a signed measure on X .

π(Z |data) ∝ L(data|Z ) π(Z ),

this implies a posterior on f .

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 35: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Levy processes

A stochastic process Z := {Zu ∈ R : u ∈ X} is called a Levyprocess if it satisfies the following conditions:

1 Z0 = 0 almost surely.

2 For any choice of m ≥ 1 and 0 ≤ u0 < u1 < ... < um, therandom variables Zu0 ,Zu1 − Zu0 , ...,Zum − Zum−1 areindependent. (Independent increments property)

3 The distribution of Zs+u − Zs is independent of Zs (Temporalhomogeneity or stationary increments property).

4 Z has cadlag paths almost surely.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 36: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Levy processes

Theorem (Levy-Khintchine)

If Z is a Levy process, then the characteristic function of Zu : u ≥ 0 has the following form:

E[eiλZu ] = exp

(

u

"

iλa −1

2+

Z

R\{0}[e

iλw− 1 − iλw1{w :|w|<1}(w)]ν(dw)

#)

,

where a ∈ IR, σ2 ≥ 0 and ν is a nonnegative measure on IR withR

R(1 ∧ |w|2)ν(dw) < ∞.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 37: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Levy processes

drift term a

variance of Brownian motion σ2

ν(dw) the jump process or Levy measure.

exp{

u[

iλa − 12σ2λ2

]}

exp{

u∫

R\{0}

[

e iλw − 1 − iλw1{w :|w |<1}(w)]

ν(dw)}

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 38: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Two approaches to Gaussian processes

Two modelling approaches

1 prior directly on the space of functions by sampling frompaths of the Gaussian process defined by K ;

2 Gaussian process prior on Z (du) implies on prior on functionspace via integral operator.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 39: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Prior on random measure

A Gaussian process prior on Z (du) is a signed measure sospan(G) ⊂ HK.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 40: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Direct prior elicitation

Theorem (Kallianpur)

If {Zu, u ∈ X} is a Gaussian process with covariance K and mean

m ∈ HK and HK is infinite dimensional, then

P(Z• ∈ HK ) = 0.

The sample paths are almost surely outside HK .

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 41: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

A bigger RKHS

Theorem (Lukic and Beder)

Given two kernel functions R and K, R dominates K (R � K) if HK ⊆ HR . Let R � K. Then

‖g‖R ≤ ‖g‖K , ∀g ∈ HK .

There exists a unique linear operator L : HR → HR whose range is contained in HK such that

〈f , g〉R = 〈Lf , g〉K , ∀f ∈ HR , ∀g ∈ HK .

In particularLRu = Ku , ∀u ∈ X .

As an operator into HR , L is bounded, symmetric, and positive.Conversely, let L : HR → HR be a positive, continuous, self-adjoint operator then

K(s, t) = 〈LRs , Rt〉R , s, t ∈ X

defines a reproducing kernel on X such that K ≤ R.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 42: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

A bigger RKHS

If L is nucular (an operator that is compact with finite traceindependent of basis choice) then we have nucular dominanceR��K .

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 43: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

A bigger RKHS

If L is nucular (an operator that is compact with finite traceindependent of basis choice) then we have nucular dominanceR��K .

Theorem (Lukic and Beder)

Let K and R be two reproducing kernels. Assume that the RKHS

HR is separable.

A necessary and sufficient condition for the existence of a Gaussian

process with covariance K and mean m ∈ HR and with trajectories

in HR with probability 1 is that R��K.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 44: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

A bigger RKHS

If L is nucular (an operator that is compact with finite traceindependent of basis choice) then we have nucular dominanceR��K .

Theorem (Lukic and Beder)

Let K and R be two reproducing kernels. Assume that the RKHS

HR is separable.

A necessary and sufficient condition for the existence of a Gaussian

process with covariance K and mean m ∈ HR and with trajectories

in HR with probability 1 is that R��K.

Characterize HR by L−1K (HK ).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 45: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Dirichlet distribution

Multinomial distribution

g(x1, ..., xk |n, p1, ..., pk ) =n!

x1! · · · xn!p

x11 · · · pxk

k ,

k∑

i=1

xi = n, xi ≥ 0.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 46: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Dirichlet distribution

Multinomial distribution

g(x1, ..., xk |n, p1, ..., pk ) =n!

x1! · · · xn!p

x11 · · · pxk

k ,

k∑

i=1

xi = n, xi ≥ 0.

Dirichlet distribution

f (p1, . . . , pk |α1, . . . , αk ) =1

B(α)

k∏

i=1

xαi−1i ,

k∑

i=1

pi = 1, pi ≥ 0,

B(α) =

∏ki=1 Γ(αi )

Γ(∑k

i=1 αi).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 47: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Conjugacy

If Prob(θ|data) and π(θ) belong to the same family they areconjugate.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 48: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Conjugacy

If Prob(θ|data) and π(θ) belong to the same family they areconjugate.

Let x = {x1, ..., xk} and p = {p1, ..., pk}

p ∼ Dir(α)

x |p ∼ Mult(p)

p|x ∼ Dir(p + α).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 49: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Dirichlet process prior

Given distribution function F and a specified distribution F0 withthe same support on a space X .Dirichlet process DP(α,F0) implies that for any partition of thespace B1, ...,BK

F (B1), ...,F (Bk ) ∼ Dir(α(F0(B1)), ..., α(F0(Bk))).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 50: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Dirichlet process prior

f (x) =

XK (x , u)Z (du) =

XK (x , u)w(u)F (du)

F (du) is a distribution and w(u) a coefficient function.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 51: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Dirichlet process prior

f (x) =

XK (x , u)Z (du) =

XK (x , u)w(u)F (du)

F (du) is a distribution and w(u) a coefficient function.

Model F using a Dirichlet process prior: DP(α,F0)

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 52: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Bayesian representer theorem

Given Xn = (x1, ..., xn)iid∼ F

F | Xn ∼ DP(α + n,Fn), Fn = (αF0 +

n∑

i=1

δxi)/(α + n).

E[f | Xn] = an

K (x , u)w(u) dF0(u)+n−1(1−an)

n∑

i=1

w(xi )K (x , xi ),

an = α/(α + n).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 53: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Levy processesGaussian processesBayesian representer theorem

Bayesian representer theorem

Taking limα → 0 to represent a non-informative prior:

Theorem (Bayesian representor theorem)

fn(x) =

n∑

i=1

wi K (x , xi ),

wi = w(xi )/n.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 54: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Likelihood

yi = f (xi ) + εi = w0 +

n∑

j=1

wjK (xi , xj) + εi , i = 1, ..., n

where εi∼No(0, σ2).

Y ∼ No(w0ι + Kw , σ2I ).

where ι = (1, ..., 1)′.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 55: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Prior specification

Factor: K = F∆F ′ with ∆ := diag(λ21, ..., λ

2n) and w = F∆−1β.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 56: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Prior specification

Factor: K = F∆F ′ with ∆ := diag(λ21, ..., λ

2n) and w = F∆−1β.

π(w0, σ2) ∝ 1/σ2

τ−1i ∼ Ga(aτ/2, bτ/2)

T := diag(τ1, ..., τn)

β ∼ No(0,T )

w |K ,T ∼ No(0,F∆−1T∆−1F ′).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 57: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Prior specification

Factor: K = F∆F ′ with ∆ := diag(λ21, ..., λ

2n) and w = F∆−1β.

π(w0, σ2) ∝ 1/σ2

τ−1i ∼ Ga(aτ/2, bτ/2)

T := diag(τ1, ..., τn)

β ∼ No(0,T )

w |K ,T ∼ No(0,F∆−1T∆−1F ′).

Standard Gibbs sampler simulates p(w ,w0, σ2|data).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 58: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Sampling from posterior

Objective: sample from p(w ,w0, σ2|data).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 59: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Sampling from posterior

Objective: sample from p(w ,w0, σ2|data).

Given for {w (i),w(i)0 , pi (w ,w0)}T

i=1 we have T functions cancompute Bayes average and variance pointwise

f (x) =

T∑

i=1

pi(w ,w0)

w(j)0 +

n∑

j=1

K (x , xj )w(i)j

var[f (x)] =

T∑

i=1

pi(w ,w0)[

f (x) − fi (x)]2

.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 60: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Markov chain Monte Carlo

It may be difficult to sample from p(w ,w0, σ2|data), for example

high dimensions. Also the normalizing constant Z is unavailable

p(w ,w0, σ2|data) =

Lik(data|w ,w0, σ2) · π(w ,w0, σ

2)

Z

Z =

Lik(data|w ,w0, σ2) · π(w ,w0, σ

2)dw0 dw dσ.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 61: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Markov chain Monte Carlo

Say we want to sample p(θ|data) but its hard.Say we have a Markov chain (aperiodic, irreducible, detailedbalance)

q(θ∗|θ) = Prob(θ∗|θ)

Prob(θ)q(θ∗|θ) = Prob(θ∗)q(θ|θ∗).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 62: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Markov chain Monte Carlo

Metropolis-Hastings

1 Given θ(t) sample θ∗ from q(θ∗|θ(t))

2 Accept, θ(t+1) = θ∗ with probability

A = min

[

1,p(θ∗)q(θ(t)|θ∗)p(θ(t))q(θ∗|θ(t))

]

otherwise θ(t+1) = θ(t).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 63: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Gibbs sampling

Given d -dimensional θ with known conditional

p(θj |θ−j) = p(θj |θ1, ...θj−1, θj+1, ..., θd ).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 64: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Gibbs sampling

Given d -dimensional θ with known conditional

p(θj |θ−j) = p(θj |θ1, ...θj−1, θj+1, ..., θd ).

Proposal distribution

q(θ∗|θ(t)) = p(θ∗j |θ(t)−j ).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 65: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Gibbs sampling

Acceptance probability

A = min

[

1,p(θ∗)q(θ(t)|θ∗)p(θ(t))q(θ∗|θ(t))

]

= min

1,p(θ∗)p(θ

(t)j |θ(t)

−j )

p(θ(t))q(θ∗j |θ∗−j)

= min

1,p(θ∗−j)

p(θ(t)−j )

.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 66: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Gibbs sampling example

We want to sample from x = 1, 2, 3, ...., n and y ∈ [0, 1]

p(x , y |n, α, β) =n!

(n − x)!x!y x+α−1 (1 − y)n−x+β−1.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 67: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Gibbs sampling example

We want to sample from x = 1, 2, 3, ...., n and y ∈ [0, 1]

p(x , y |n, α, β) =n!

(n − x)!x!y x+α−1 (1 − y)n−x+β−1.

Conditionals

x |y ∼ Bin(n, y) =n!

(n − x)!y x (1 − y)(n−x)

y |x ∼ Be(x + α, n − x + β) ∝ y x+α (1 − y)n−x+β

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 68: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Gibbs sampling example

1 given yt

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 69: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Gibbs sampling example

1 given yt

2 draw xt+1 ∼ Bin(n, yt)

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 70: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Gibbs sampling example

1 given yt

2 draw xt+1 ∼ Bin(n, yt)

3 draw yt+1 ∼ Be(xt+1 + α, n − xt+1 + β)

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 71: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Gibbs sampling example

1 given yt

2 draw xt+1 ∼ Bin(n, yt)

3 draw yt+1 ∼ Be(xt+1 + α, n − xt+1 + β)

4 return to (2).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 72: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Kernel model extension

Kν(x , u) = K (√

ν ⊗ x ,√

ν ⊗ u)

with ν = {ν1, ..., νp} with νk ∈ [0,∞) as a scale parameter.

kν(x , u) =

p∑

k=1

νk xk uk ,

kν(x , u) =

(

1 +

p∑

k=1

νk xk uk

)d

,

kν(x , u) = exp

(

−p∑

k=1

νk(xk − uk)2

)

.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 73: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Prior specification

νk ∼ (1 − γ)δ0 + γ Ga(aν , aνs), (k = 1, . . . , p),

s ∼ Exp(as), γ ∼ Be(aγ , bγ)

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 74: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Prior specification

νk ∼ (1 − γ)δ0 + γ Ga(aν , aνs), (k = 1, . . . , p),

s ∼ Exp(as), γ ∼ Be(aγ , bγ)

Standard Gibbs sampler does not work: Metropolis-Hastings.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 75: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Problem setup

Labelled data : (Y p,X p) = {(ypi , xp

i ); i = 1 : np} iid∼ ρ(Y ,X |φ, θ).

Unlabelled data: X m = {xmi , i = (1) : (nm)} iid∼ ρ(X |θ).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 76: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Problem setup

Labelled data : (Y p,X p) = {(ypi , xp

i ); i = 1 : np} iid∼ ρ(Y ,X |φ, θ).

Unlabelled data: X m = {xmi , i = (1) : (nm)} iid∼ ρ(X |θ).

How can the unlabelled data help our a predictive model ?

data = {Y ,X ,Xm}

p(φ, θ|data) ∝ π(φ, θ)p(Y |X , φ)p(X |θ)p(X m|θ).

Need very strong dependence between θ and φ.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 77: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Bayesian kernel model

Result of DP prior

fn(x) =

np+nm∑

i=1

wi K (x , xi ).

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 78: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Bayesian kernel model

Result of DP prior

fn(x) =

np+nm∑

i=1

wi K (x , xi ).

Same as in Belkin and Niyogi but without

minf ∈HK

[

L(f , data) + λ1‖f ‖2K + λ2‖f ‖2

I

]

.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 79: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Bayesian kernel model

Result of DP prior

fn(x) =

np+nm∑

i=1

wi K (x , xi ).

1 θ = F (·) so that p(x |θ)dx = dF (x) - the parameter is the fulldistribution function itself;

2 p(y |x , φ) depends intimately on θ = F ; in fact, θ ⊆ φ in thiscase and dependence of θ and φ is central to the model.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 80: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Likelihood and prior specificationVariable selectionSemi-supervised learning

Bayesian kernel model

Result of DP prior

fn(x) =

np+nm∑

i=1

wi K (x , xi ).

1 θ = F (·) so that p(x |θ)dx = dF (x) - the parameter is the fulldistribution function itself;

2 p(y |x , φ) depends intimately on θ = F ; in fact, θ ⊆ φ in thiscase and dependence of θ and φ is central to the model.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 81: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Simulated data – semi-supervised

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 82: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Simulated data – semi-supervised

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 83: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Simulated data – semi-supervised

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 84: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Cancer classification – semi-supervised

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Percentage of Labelled Examples

Aver

age

(Har

d) E

rror R

ate

Without Unlabellled DataWith Unlabelled Data

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 85: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Simulated data – feature selection

−2 −1 0 1 2−2

−1

0

1

2

−2 −1 0 1 2

−2

−1

0

1

2

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 86: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Simulated data – feature selection

0 2 4 6 8 10 12 14 16 18 20

0

1

2

3

4

5

6

7

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 87: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

MNIST digits – feature selection

5 10 15 20 25

5

10

15

20

25

5 10 15 20 25

5

10

15

20

25

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 88: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

MNIST digits – feature selection

0 5 10 15 20 25 30 35 40 450

0.05

0.1

0.15

0.2

0.25

0.3

0.35

With variable selectionWithout variable selection

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 89: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Discussion

Lots of work left:

Further refinement of integral operators and priors in terms ofSobolev spaces.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 90: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Discussion

Lots of work left:

Further refinement of integral operators and priors in terms ofSobolev spaces.

Semi-supervised setting: relation of kernel model and priorswith Laplace-Beltrami and graph Laplacian operators.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 91: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Discussion

Lots of work left:

Further refinement of integral operators and priors in terms ofSobolev spaces.

Semi-supervised setting: relation of kernel model and priorswith Laplace-Beltrami and graph Laplacian operators.

Semi-supervised setting: Duality between diffusion processeson manifolds and Markov chains.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 92: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Discussion

Lots of work left:

Further refinement of integral operators and priors in terms ofSobolev spaces.

Semi-supervised setting: relation of kernel model and priorswith Laplace-Beltrami and graph Laplacian operators.

Semi-supervised setting: Duality between diffusion processeson manifolds and Markov chains.

Bayesian variable selection: Efficient sampling and search inhigh-dimensional space.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 93: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Discussion

Lots of work left:

Further refinement of integral operators and priors in terms ofSobolev spaces.

Semi-supervised setting: relation of kernel model and priorswith Laplace-Beltrami and graph Laplacian operators.

Semi-supervised setting: Duality between diffusion processeson manifolds and Markov chains.

Bayesian variable selection: Efficient sampling and search inhigh-dimensional space.

Numeric stability and statistical robustness.

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion

Page 94: Finding religion: kernels and the Bayesian persuasionweb.mit.edu/~9.520/www/spring08/Classes/statstalk1.pdf · persuasion Rev. Dr. Sayan Mukherjee Department of Statistical Science

Kernel models and penalized lossBayesian kernel model

Priors on measuresEstimation and inference

Results on dataOpen problems

Summary

Its extra work but it pays to be Bayes :)

Rev. Dr. Sayan Mukherjee Finding religion: kernels and the Bayesian persuasion