Conditional Expectation Manifolds and Brain Population ...campar.in.tum.de/personal/mateus/2011MICCAIManifoldTutorial/html/... · Laplacian Eigenmaps • Given a manifold ﬁnd functions

Conditional Expectation Manifoldsand

Brain Population Analysis

Samuel Gerber, University of Utah

Manifold Learning

Some observations on popular algorithms

−30 −20 −10 0 10 20 30 40−25

−20

−15

−10

−5

0

5

10

15

20

25

Isomap• Approximate geodesic distances by

shortest path in nearest neighbor graph

• Preserve approximate geodesics

•

• Multidimensional scaling

X ∼ Uniform([0,1]d)

P(X ∈ Sd) =πd/2

Γ(d/2+1)2d

limd→∞ P(X ∈ Sd) = 0minx = ∑i, j[δ (yi,y j)−d(xi,x j)]2

var(PY )X = PYyi = ∑N

k=0 P(Ck|xi)(ak +bkxi)ri(y) = E[X ∈Ci|Y = y]Ci = xi : src(xi) = xmin,sink(xi) = xmaxri(y) = E[X ∈Ci|Y = y]

1


P(X ∈ Sd) =πd/2

Γ(d/2+1)2d




1

Properties• Only relies on accurate local distances

• Shortcuts in graph - very bad approximation

• Quality measure based on graph embedding

• Hard to detect

−30 −20 −10 0 10 20 30 40−25

−20

−15

−10

−5

0

5

10

15

20

25

1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

DimensionD

isto

rtio

n

Properties

• Classical multidimensional scaling is not minimizing

• Optimization based approaches


P(X ∈ Sd) =πd/2

Γ(d/2+1)2d




1

A. Agarwal, J. Phillips and S. Venkatasubramanian, Universal Multi-Dimensional Scaling, Conference on Knowledge Discovery and Data Mining 2010

J. Kruskal, Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis, Psychometrika 1964

http://www.cs.utah.edu/~suresh/web/2010/02/06/universal-mds/




http://www.kdd2010.com/




Laplacian Eigenmaps• Given a manifold find functions

such that is minimized

• The low dimensional embedding is

• Small gradient implies that close by points will be mapped close together

−10 −5 0 5 10 150

50

100−15

−10

−5

0

5

10

−10 −5 0 5 10 150

50

100−15

−10

−5

0

5

10

−0.06 −0.04 −0.02 0 0.02 0.04 0.06−0.05

−0.04

−0.03

−0.02

−0.01

0

0.01

0.02

0.03

fM∇ f (y)2dy M ∆ f

min f E[g( f (Y ))−Y2]minz1,...,zn ∑ig( f (yi))− yi2

φ(r, t) = r + t

0 v(φ(r,τ),τ) dτv(r,τ)Q

d(yi,y j)2 = minv 1

0 v(r,τ)Q dτsuch that

Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)

φ(r,1)≈ v(r,0) = u(r), and d(yi,y j)2 ≈minu

Ωu(r)Q dr,subject to

Ωyi(r +u(r))− y j(r))2

2 dr ≤ ε (2)

da(yi,y j)2 = minu

Ω ||u(r)||2Q drsuch that


2 dr ≤ ε (3)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1

2(da(yi,y j)+da(y j,yi)) .(4)

1



φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)




2 dr ≤ ε (2)

da(yi,y j)2 = minu



2 dr ≤ ε (3)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1


1

f : M → RM∇ f (y)2dy M ∆ f


φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)




2 dr ≤ ε (2)

da(yi,y j)2 = minu



2 dr ≤ ε (3)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1


1

f : M → RM∇ f (y)2dy

M∆ fx = [ f1(y), . . . , fn(y)] ∈ Rn


φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)




2 dr ≤ ε (2)

1

f1f2x = [ f1(y), f2(y)]f : M → RM∇ f (y)2dy

M∆ fx = [ f1(y), . . . , fn(y)] ∈ Rn


φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)

1


M∆ fx = [ f1(y), . . . , fn(y)] ∈ Rn


φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)

1


M∆ fx = [ f1(y), . . . , fn(y)] ∈ Rn


φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)

1

Properties

• Again only local distances important

• No quality measure of the embedding


M∆ fx = [ f1(y), . . . , fn(y)] ∈ Rn


φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)

1


M∆ fx = [ f1(y), . . . , fn(y)] ∈ Rn


φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)

1


M∆ fx = [ f1(y), . . . , fn(y)] ∈ Rn


φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)

1

Eigenfunction Issue

• Minimzing

• Orthogonality constraint on f in function space (not geometrically on manifold)

• Eigenvectors with higher frequency along same extension on the manifold can have smaller cost



φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)




2 dr ≤ ε (2)

da(yi,y j)2 = minu



2 dr ≤ ε (3)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1


1

Eigenfunction Issue

• B is orthogonal to A (in function space)

• Cost of B less than C (the desired eigenvector)

Samuel Gerber, Tolga Tasdizen, Ross Whitaker, Robust Non-linear Dimensionality Reduction using Successive 1-Dimensional Laplacian Eigenmaps, ICML 2007

x

yx

y

x

y

y

fx

fx

f

B

C

A

Conditional Expectation Manifolds

Manifold learning as unsupervised non-parametric model fitting

Principal Curves/SurfacesCurve through the middle of a density

T. Hastie, W. Stuetzle, Principal curvesJournal of the American Statistical Association 1989

Principal Surface Definition• Minimal orthogonal projection onto surface

• Principal surface iff conditional expectation of the projection equal to surface

Principal Surface Estimation

• Principal surfaces are extremal points of (objective function)

• Pick a parametrized surface model

• Optimize over parameters of

• Unfortunately principal surfaces are all saddle points of

• Projection is a non-linear optimization problem

Conditional Expectation Manifolds (CEM)

• Define a coordinate mapping

• Model surface as conditional expectation of coordinate mapping.

• Optimize coordinate mapping

CEM Estimation

• Coordinate mapping as kernel regression

s

Samuel Gerber, Tolga Tasdizen, Ross Whitaker "Dimensionality Reduction and Principal Surfaces via Kernel Map Manifolds", (ICCV 2009)

CEM Estimation

• Conditional expectation estimated with kernel regression

s

Samuel Gerber, Tolga Tasdizen, Ross Whitaker "Dimensionality Reduction and Principal Surfaces via Kernel Map Manifolds", (ICCV 2009)

Some results

• Effect of optimization

Input Initial MSE 8.6 Optimized MSE 2.6

Some results• 1965 images of different facial expression (20x28)

Work in Progress• Saddle point property of extrema is

problematic for model selection

−1.5 −1.0 −0.5 0.0 0.5 1.0

−0.5

0.0

0.5

1.0

1.5

2.0

y1

y 2

ground truthinitializationintermediatesselected

0 20 40 60 80 100

0.02

0.04

0.06

0.08

iteration

d(!,

Y)2

!

!

traintest

(a) (b)

Figure 2: Minimization of d(λ ,Y )2 with automatic bandwidth selection starting fromσg = 1 and σλ = 0.1. (a) fitted curve with optimization path and (b) train and test errorwith points indicating minimal train and test error, respectively.

−1.5 −1.0 −0.5 0.0 0.5 1.0

−0.5

0.0

0.5

1.0

1.5

2.0

y1

y 2


0 20 40 60 80 1000.000

0.004

0.008

iteration

q(!,

Y)2

!

!

traintest

(a) (b)

Figure 3: Minimization of q(λ ,Y )2 with automatic bandwidth selection starting fromσg = 1 and σλ = 0.1. (a) fitted curve with optimization path and (b) train and test errorwith points indicating minimal train and test error, respectively.

14

Work in Progress• Conditional expectation manifolds pave

way for other objective functions

−1.5 −1.0 −0.5 0.0 0.5 1.0

−0.5

0.0

0.5

1.0

1.5

2.0

y1

y 2


0 20 40 60 80 100

0.00

0.02

0.04

0.06

iteration

d(!,

Y)2

!

!

traintest

(a) (b)

Figure 4: Minimization of d(λ ,Y )2 with automatic bandwidth selection starting fromσg = 0.1 and σλ = 0.1. (a) fitted curve with optimization path and (b) train and testerror with points indicating minimal train and test error, respectively.

−1.5 −1.0 −0.5 0.0 0.5 1.0

−0.5

0.0

0.5

1.0

1.5

2.0

y1

y 2


0 20 40 60 80 1000.00

0.01

0.02

0.03

0.04

iteration

q(!,

Y)2

!!

traintest

(a) (b)

Figure 5: Minimization of q(λ ,Y )2 with automatic bandwidth selection starting fromσg = 0.1 and σλ = 0.1. (a) fitted curve with optimization path and (b) train and testerror with points indicating minimal train and test error, respectively.

15

Brain Population Analysis

Motivation

• Proof of concept

• Conditional expectation manifold for brain images

• Non-linearity in shape space

• Natural extension at the time from single atlas to multiple atlases to continuum

• Simplify statistics on shape spaces

Measuring Shape Differences

• Euclidean space does not capture changes in shape

• Distance based on measuring length of transformation

• Diffeomorphic transform

• Riemannian metric ( )

• Geodesics on diffeomorphic transformations

• Induces metric on images

φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)




2 dr ≤ ε (2)

da(yi,y j)2 = minu



2 dr ≤ ε (3)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1

2(da(yi,y j)+da(y j,yi)) .(4)Q = α∇+(1−α)Iu(r)2

Q = α||∇u(r)||2 +(1−α)||u(r)||2M ∆ f (y) f (y)dy

1

φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)




2 dr ≤ ε (2)

da(yi,y j)2 = minu



2 dr ≤ ε (3)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1


Q = α||∇u(r)||2 +(1−α)||u(r)||2M ∆ f (y) f (y)dy

1

φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)




2 dr ≤ ε (2)

da(yi,y j)2 = minu



2 dr ≤ ε (3)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1


Q = α||∇u(r)||2 +(1−α)||u(r)||2M ∆ f (y) f (y)dy

1

Large Deformation Diffeomorphic Metric

φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)




2 dr ≤ ε (2)

da(yi,y j)2 = minu



2 dr ≤ ε (3)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1


Q = α||∇u(r)||2 +(1−α)||u(r)||2M ∆ f (y) f (y)dy

1

φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)




2 dr ≤ ε (2)

da(yi,y j)2 = minu



2 dr ≤ ε (3)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1


Q = α||∇u(r)||2 +(1−α)||u(r)||2M ∆ f (y) f (y)dy

1

φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)




2 dr ≤ ε (2)

da(yi,y j)2 = minu



2 dr ≤ ε (3)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1


Q = α||∇u(r)||2 +(1−α)||u(r)||2M ∆ f (y) f (y)dy

1

φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)




2 dr ≤ ε (2)

da(yi,y j)2 = minu



2 dr ≤ ε (3)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1


Q = α||∇u(r)||2 +(1−α)||u(r)||2M ∆ f (y) f (y)dy

1

φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)




2 dr ≤ ε (2)

da(yi,y j)2 = minu



2 dr ≤ ε (3)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1


Q = α||∇u(r)||2 +(1−α)||u(r)||2M ∆ f (y) f (y)dy

1

d(e,φ)2 = minv t

0

Ωv(r,τ)Qdr dτd(yi,y j)2 = minv

10 v(r,τ)Q dτ

such that

Ωyi(φ(r,1))− y j(r))22 dr = 0

(4)




2 dr ≤ ε (5)

da(yi,y j)2 = minu



2 dr ≤ ε (6)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1


Q = α||∇u(r)||2 +(1−α)||u(r)||2M ∆ f (y) f (y)dy

var(PY )X = PY

2

φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)




2 dr ≤ ε (2)

da(yi,y j)2 = minu



2 dr ≤ ε (3)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1


Q = α||∇u(r)||2 +(1−α)||u(r)||2M ∆ f (y) f (y)dy

1

Manifold in Brain SpaceSpace of Smooth Images

Manifold induced by

diffeomorphic image

metric

Learned data

manifold

Samples/images

Frechet mean on

metric manifold

Frechet mean on

data manifold

Data set:spiral segments Manifold mean Diffeomorphic mean

mean on

metric manifold

Manifold in Brain Space

Approximating the Diffeomorphic Metric

• For small deformations work in tangent space

• Distance defined by

• For symmetry

φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)




2 dr ≤ ε (2)

da(yi,y j)2 = minu



2 dr ≤ ε (3)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1


Q = α||∇u(r)||2 +(1−α)||u(r)||2M ∆ f (y) f (y)dy

1

φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)




2 dr ≤ ε (2)

da(yi,y j)2 = minu



2 dr ≤ ε (3)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1


Q = α||∇u(r)||2 +(1−α)||u(r)||2M ∆ f (y) f (y)dy

1

φ(r, t) = r + t


d(yi,y j)2 = minv 1


Ωyi(φ(r,1))− y j(r))2

2 dr = 0(1)




2 dr ≤ ε (2)

da(yi,y j)2 = minu



2 dr ≤ ε (3)

limd→∞ P(X ∈ Sd) = 0d(yi,y j) = 1


Q = α||∇u(r)||2 +(1−α)||u(r)||2M ∆ f (y) f (y)dy

1

Manifold Representation

• Represent manifold as conditional expectation of some function

• Non euclidean space use Frechet mean

f(y) = ∑ni=1

Ky(d(y,yi))zi∑n

j=1 Ky(d(y,y j)).(1)

g(x) = argminy ∑ni=1

Kx(x− f (yi))2)∑n

j=1 Kx(x− f (y j))2)d(y,yi)2 .(2)

ym = argminy∈M ∑ni=1 wid(y,yi)2 , (3)

g(x) = E[Y | f (y) = x]M∇ f (y)2dy

M∆ fx = [ f1(y), . . . , fn(y)] ∈ Rn


φ(r, t) = r + t


1

f(y) = ∑ni=1

Ky(d(y,yi))zi∑n

j=1 Ky(d(y,y j)).(1)



j=1 Kx(x− f (y j))2)d(y,yi)2 .(2)


g(x) = E[Y | f (y) = x]M∇ f (y)2dy

M∆ fx = [ f1(y), . . . , fn(y)] ∈ Rn


φ(r, t) = r + t


1

f(y) = ∑ni=1

Ky(d(y,yi))zi∑n

j=1 Ky(d(y,y j)).(1)



j=1 Kx(x− f (y j))2)d(y,yi)2 .(2)


g(x) = E[Y | f (y) = x]M∇ f (y)2dy

M∆ fx = [ f1(y), . . . , fn(y)] ∈ Rn


φ(r, t) = r + t


1

B. Davis, P. Fletcher, E. Bullitt, S. Joshi, Population shape regression from random design data, ICCV 2007

Manifold Representation• Compute embedding based on pairwise

distance matrix (isomap)

• Define coordinate mapping based kernel map manifold approach

f(y) = ∑ni=1

Ky(d(y,yi))zi∑n

j=1 Ky(d(y,y j)).(1)



j=1 Kx(x− f (y j))2)d(y,yi)2 .(2)


g(x) = E[Y | f (y) = x]M∇ f (y)2dy

M∆ fx = [ f1(y), . . . , fn(y)] ∈ Rn


φ(r, t) = r + t


1

• In all steps:

• Large distances have negligible effect

Manifold Representation

f(y) = ∑ni=1

Ky(d(y,yi))zi∑n

j=1 Ky(d(y,y j)).(1)



j=1 Kx(x− f (y j))2)d(y,yi)2 .(2)


g(x) = E[Y | f (y) = x]M∇ f (y)2dy

M∆ fx = [ f1(y), . . . , fn(y)] ∈ Rn


φ(r, t) = r + t


1

f(y) = ∑ni=1

Ky(d(y,yi))zi∑n

j=1 Ky(d(y,y j)).(1)



j=1 Kx(x− f (y j))2)d(y,yi)2 .(2)


g(x) = E[Y | f (y) = x]M∇ f (y)2dy

M∆ fx = [ f1(y), . . . , fn(y)] ∈ Rn


φ(r, t) = r + t


1

Results• OASIS data set

• 416 subjects, age 16 to 80

• 100 subjects diagnosed with mild to moderate dementia

• ADNI data set

• 156 Subjects, age 57 to 88

• 38 normal, 84 MCI, 34 early AD

20 22 24 26 28 300

5

10

15

20

25

30 MMSE Histogram

10 15 20 25 300

20

40

60

80

100

120

140 MMSE Histogram

OASIS 2D Embedding

Manifold Fit - OASIS• Measure reconstruction error

• Comparison to PCA

• Comparison of different metrics

• Scale by average nearest neighbor distance

f(y) = ∑ni=1

Ky(d(y,yi))zi∑n

j=1 Ky(d(y,y j)).(1)



j=1 Kx(x− f (y j))2)d(y,yi)2 .(2)


g(x) = E[Y | f (y) = x]M∇ f (y)2dy

error = ∑i d(g( f (yi)),yi)∑i d(nn(yi),yi)

∆ fx = [ f1(y), . . . , fn(y)] ∈ Rn


φ(r, t) = r + t

0 v(φ(r,τ),τ) dτ

1

Manifold Model PCA

Manifold Fit - ADNI

1.07 0.81 1.23Projection distance

Statistical Analysis - OASIS• Linear regression on age, MMSE, CDR

• Comparison to PCA and age as predictor

• Controlled for age - BIC to select best model

Statistical Analysis - OASIS

• Restricted to subjects age above 60

Statistical Analysis - ADNI

Reconstructions -ADNI

ADNI - Statistics

Extensions• Different Metrics?

• Transformation based metric is expensive

• No optimization of conditional expectation manifold

• Embedding/Statistics including metric tensor.

• Adding supervision

• Fit manifold with respect to a clinical predictor

Thank you

This work is supported by

NIH/NCBC grant U54-EB005149NSF grant CCF-073222

NIBIB grant 5RO1EB007688-02

Thoughts on Manifold Learning• For which applications / tasks is manifold learning

effective?

• Purely unsupervised tasks are rare

• Exploratory analysis

• In supervised settings:

• Manifold learning as regularization

• Feature extraction

• Stratified, non flat-able manifolds and detection of non-manifold structure

Documents

Conditional Expectation Manifolds and Brain Population ...campar.in.tum.de/personal/mateus/2011MICCAIManifoldTutorial/html/... · Laplacian Eigenmaps • Given a manifold ﬁnd functions