On Laplacian Eigenmaps for Dimensionality Reduction · On Laplacian Eigenmaps for Dimensionality Reduction Dr. Juan Orduz PyData Berlin 2018. Overview Introduction Warming Up The

On Laplacian Eigenmaps for DimensionalityReduction

Dr. Juan Orduz

PyData Berlin 2018

Overview

Introduction

Warming UpThe Spectral Theorem

MotivationToy Model Example

The AlgorithmDescriptionJustification

Examples: Scikit-Learn

Spectral Geometry*The LaplacianThe Heat Kernel

Can One Hear the Shape of a Drum?[Kac66]

A differentiable manifold is a type of manifold that is locallysimilar enough to a linear space to allow one to do calculus. A(Riemannian) metric g allow us to measure distances.

U ⊂ Rn

We can consider the Laplacian L : C∞(M) −→ C∞(M) and itsspectrum spec(L) = {λ0, λ1, · · · , λk , · · · −→ ∞}.I If we are given spec(L) we can infer the dimension of M, its

volume and its total scalar curvature.



U ⊂ Rn

We can consider the Laplacian L : C∞(M) −→ C∞(M) and itsspectrum spec(L) = {λ0, λ1, · · · , λk , · · · −→ ∞}.

I If we are given spec(L) we can infer the dimension of M, itsvolume and its total scalar curvature.



U ⊂ Rn

We can consider the Laplacian L : C∞(M) −→ C∞(M) and itsspectrum spec(L) = {λ0, λ1, · · · , λk , · · · −→ ∞}.I If we are given spec(L) we can infer the dimension of M, its

volume and its total scalar curvature.

Spectral Geometry for Dimensionality Reduction?

Let us assume we have data points x1, · · · , xk ∈ RN which lieon an unknown submanifold M ⊂ RN .

Key Observation

I Eigenfunctions of L on M can be used to define lowerdimensional embeddings.

Idea ([BN03])

I Model M by constructing a graph G = (V ,E) where closedata points are connected by edges.

I Construct the graph Laplacian L on G.I Compute spec(L) and the corresponding eigenfunctions.I Use these eigenfunctions to construct an embedding

F : V −→ Rm for m < N.



Key Observation


Idea ([BN03])


I Construct the graph Laplacian L on G.

I Compute spec(L) and the corresponding eigenfunctions.I Use these eigenfunctions to construct an embedding




Key Observation


Idea ([BN03])


I Construct the graph Laplacian L on G.I Compute spec(L) and the corresponding eigenfunctions.

I Use these eigenfunctions to construct an embeddingF : V −→ Rm for m < N.



Key Observation


Idea ([BN03])


I Construct the graph Laplacian L on G.I Compute spec(L) and the corresponding eigenfunctions.I Use these eigenfunctions to construct an embedding


The Spectral Theorem

Let A ∈ Mn×n(R) be a symmetric matrix, i.e. A = A†.

RecallI λ ∈ C is an eigenvalue for A with eigenvector f ∈ Rn,

f 6= 0, ifAf = λf .

I A set of vectors B = {f1, f2, · · · , fn} is a basis for Rn if:I They are linearly independent.I They generate Rn.

I B is said to be an orthonormal basis if 〈fi , fj〉 = δij .

Spectral TheoremThere exists an orthonormal basis of Rn consisting ofeigenvectors of A. Each eigenvalue is real.




f 6= 0, ifAf = λf .







f 6= 0, ifAf = λf .




Min(Max)imizing Properties of EigenvaluesLet A ∈ Mn(R) be a symmetric matrix with spectraldecomposition λ0 ≤ λ1 ≤ · · · ≤ λn.

For later purposes, we would like to find

argmax||f ||=1

〈Af , f 〉.

I Define the associated Lagrange optimization problem

L(f , λ) = 〈Af , f 〉 − λ(||f ||2 − 1).

I Take the derivative with respect to f∂

∂fL(f , λ) = 2(Af − λf ) !

= 0.

I Hence,

argmax||f ||=1

〈Af , f 〉 = fn and argmin||f ||=1

〈Af , f 〉 = f0.



argmax||f ||=1

〈Af , f 〉.


L(f , λ) = 〈Af , f 〉 − λ(||f ||2 − 1).


∂fL(f , λ) = 2(Af − λf ) !

= 0.

I Hence,

argmax||f ||=1


〈Af , f 〉 = f0.



argmax||f ||=1

〈Af , f 〉.


L(f , λ) = 〈Af , f 〉 − λ(||f ||2 − 1).


∂fL(f , λ) = 2(Af − λf ) !

= 0.

I Hence,

argmax||f ||=1


〈Af , f 〉 = f0.



argmax||f ||=1

〈Af , f 〉.


L(f , λ) = 〈Af , f 〉 − λ(||f ||2 − 1).


∂fL(f , λ) = 2(Af − λf ) !

= 0.

I Hence,

argmax||f ||=1


〈Af , f 〉 = f0.

Step 0: Understand the Problem

Consider the problem of mapping these points to a line so thatclose points stay as together as possible.

1

2

3

4

Step 1: From Data to Adjacency Graph

I Define a distance function: first nearest neighbour.

I For each node, attach an edge for close points.

1

2

3

4


I Define a distance function: first nearest neighbour.I For each node, attach an edge for close points.

1

2

3

4



1

2

3

4



1

2

3

4



1

2

3

4

Step 2: Construct the Adjacency and Degree Matrices

1

2

3

4

W =

0 1 1 11 0 0 01 0 0 01 0 0 0

D =

3 0 0 00 1 0 00 0 1 00 0 0 1

Step 3: Spectrum of the Graph LaplacianI Construct the operator L defined by

L := D −W =

3 −1 −1 −1−1 1 0 0−1 0 1 0−1 0 0 1

I Consider the generalized eigenvalue problem

Lf = λDf .

Equivalently, D−1Lf = λf .

I Eigenvalues: λ0 = 0, λ1 = 1, λ2 = 1, λ3 = 2.I An eigenvector for λ1 = 1 is y := f1 = (0,−3,1,2).I The vector y : V −→ R defines and embedding.

12 3 4


L := D −W =

3 −1 −1 −1−1 1 0 0−1 0 1 0−1 0 0 1


Lf = λDf .

Equivalently, D−1Lf = λf .I Eigenvalues: λ0 = 0, λ1 = 1, λ2 = 1, λ3 = 2.

I An eigenvector for λ1 = 1 is y := f1 = (0,−3,1,2).I The vector y : V −→ R defines and embedding.

12 3 4


L := D −W =

3 −1 −1 −1−1 1 0 0−1 0 1 0−1 0 0 1


Lf = λDf .

Equivalently, D−1Lf = λf .I Eigenvalues: λ0 = 0, λ1 = 1, λ2 = 1, λ3 = 2.I An eigenvector for λ1 = 1 is y := f1 = (0,−3,1,2).

I The vector y : V −→ R defines and embedding.

12 3 4


L := D −W =

3 −1 −1 −1−1 1 0 0−1 0 1 0−1 0 0 1


Lf = λDf .

Equivalently, D−1Lf = λf .I Eigenvalues: λ0 = 0, λ1 = 1, λ2 = 1, λ3 = 2.I An eigenvector for λ1 = 1 is y := f1 = (0,−3,1,2).I The vector y : V −→ R defines and embedding.

12 3 4

The AlgorithmLet x1, · · · , xk ∈ RN .

1. Construct a weighted graph G = (V ,E) with k nodes,one for each point, and a set of edges connectingneighbouring points. Select a distance function:I (Euclidean Distance) Let ε > 0. We connect and edge

between i and j if ||xi − xj ||2 < ε.I n nearest neighbours.

2. Choose Weights. If nodes i and j are connected, putI Wij = 1.

I (Heat Kernel) Wij := e−||xi−xj ||

2

t for some t > 0.3. Assume G is connected. Compute the eigenvalues of the

generalized eigenvector problem Lf = λDf , whereI D is the diagonal weight matrix, Dii =

∑kj=1 Wij .

I L := D −W is the graph Laplacian.4. Construct Embedding. Let f0, f1, · · · , fk−1 be the

corresponding eigenvectors ordered according to theireigenvalues (λ0 = 0). For m < N, set

F (i) := (f1(i), · · · , fm(i)).






2

t for some t > 0.

3. Assume G is connected. Compute the eigenvalues of thegeneralized eigenvector problem Lf = λDf , whereI D is the diagonal weight matrix, Dii =

∑kj=1 Wij .



F (i) := (f1(i), · · · , fm(i)).






2



∑kj=1 Wij .

I L := D −W is the graph Laplacian.

4. Construct Embedding. Let f0, f1, · · · , fk−1 be thecorresponding eigenvectors ordered according to theireigenvalues (λ0 = 0). For m < N, set

F (i) := (f1(i), · · · , fm(i)).






2



∑kj=1 Wij .



F (i) := (f1(i), · · · , fm(i)).

Why does it work?m = 1

Assume you have constructed the weighted graph G = (V ,E).We want to construct an embedding F : V −→ R.

Hint: Minimize

J(y) :=k∑

i,j=1

(yi − yj)2Wij

∗= 2y†Ly .

Thus, the problem reduces to find

argminy†Dy=1y†D1=0

y†Ly = argminy†Dy=1y†D1=0

〈Ly , y〉

I y†Dy = 1 fixes the scale.I y†D1 = 0 eliminates the trivial solution y = 1.

This translates to finding the minimum non-zero eigenvalue andeigenvector of

Ly = λDy .



Hint: Minimize

J(y) :=k∑

i,j=1

(yi − yj)2Wij

∗= 2y†Ly .




〈Ly , y〉



Ly = λDy .



Hint: Minimize

J(y) :=k∑

i,j=1

(yi − yj)2Wij

∗= 2y†Ly .




〈Ly , y〉



Ly = λDy .

Why does it work?m > 1 (Vectorize)

Assume you have constructed the weighted graph G = (V ,E).We want to construct an embedding F : V −→ Rm.

Hint: Minimize, for Y = (y1 · · · ym) ∈ Mk×m(R),

J(Y ) :=k∑

i,j=1

||Yi − Yj ||2Wij = tr(Y †LY ).


argmintr(Y †DY=I)

tr(Y †LY )

This translates to finding the minimum non-zero eigenvaluesand eigenvectors of

Lf = λDy .

Examples: Scikit-Learn

Let us go to a Jupyter notebook to see some examples.

The LaplacianSecond order differential operator L : C∞c (M) −→ C∞c (M).I For M = Rn,

L = −n∑

i=1

∂2

∂x2i

I For (M,g) Riemannian manifold,

L = −n∑

i=1

n∑j=1

g ij ∂2

∂xi∂xj+ lower order terms.

Spectral Theorem ([Ros97])L is symmetric with respect to the inner product in C∞c (M),

(f ,g)L2 =

∫M

f (x)g(x)dx .

If M is compact, there exists an orthonormal basis of L2(M)consisting of eigenvectors of L. Each eigenvalue is real.

The LaplacianSecond order differential operator L : C∞c (M) −→ C∞c (M).I For M = Rn,

L = −n∑

i=1

∂2

∂x2i

I For (M,g) Riemannian manifold,

L = −n∑

i=1

n∑j=1

g ij ∂2

∂xi∂xj+ lower order terms.

Spectral Theorem ([Ros97])L is symmetric with respect to the inner product in C∞c (M),

(f ,g)L2 =

∫M

f (x)g(x)dx .

If M is compact, there exists an orthonormal basis of L2(M)consisting of eigenvectors of L. Each eigenvalue is real.

Embedding trough EigenmapsLet (M,g) be a compact Riemannian manifold and f : M −→ R.I If x , z ∈ M are close, then

|f (x)− f (z)| ≤ distM(x , z)||∇f ||+ o(distM(x , z)).

I We want a map that best preserves locality on average,

argmin||f ||L2(M)

=1

∫M||∇f ||2dx . (1)

I By Stokes’ Theorem∫M||∇f ||2dx =

∫M(Lf )fdx = (Lf , f )L2 .

I (1) must be an eigenvalue of the Laplacian.




argmin||f ||L2(M)

=1

∫M||∇f ||2dx . (1)


∫M(Lf )fdx = (Lf , f )L2 .





argmin||f ||L2(M)

=1

∫M||∇f ||2dx . (1)


∫M(Lf )fdx = (Lf , f )L2 .


The Graph Laplacian as a Differential Operator

1

2

3

4

e1

e2

e3

∇ =

−1 1 0 0−1 0 1 0−1 0 0 1

⇒ ∇†∇ =

3 −1 −1 −1−1 1 0 0−1 0 1 0−1 0 0 1

So we see,

L = ∇†∇.

The Heat KernelLet f : M −→ R. Consider the Heat Equation on M,

(∂t + L)u(x , t) = 0 with intitial condition u(x ,0) = f (x).

I The solution is given by ([Ros97])

u(x , t) =∫

MHt(x , y)f (y)dy ,

where the Heat Kernel has the form

Ht(x , y) = (4πt)−dim(M)/2e−distM (x,y)2

4t (φ(x , y) + O(t)),

for certain φ is a smooth function with φ(x , x) = 1.I It can be shown that, for x1, · · · , xk ∈ M and t > 0 small,

Lf (xi) ≈1t

f (xi)−∑

0<||xi−xj ||2<ε e−||xi−xj ||

2

4t f (xj)∑0<||xi−xj ||2<ε e−

||xi−xj ||2

4t

which justifies Wij = e−

||xi−xj ||2

4t .




u(x , t) =∫

MHt(x , y)f (y)dy ,



4t (φ(x , y) + O(t)),

for certain φ is a smooth function with φ(x , x) = 1.

I It can be shown that, for x1, · · · , xk ∈ M and t > 0 small,

Lf (xi) ≈1t

f (xi)−∑

0<||xi−xj ||2<ε e−||xi−xj ||

2

4t f (xj)∑0<||xi−xj ||2<ε e−

||xi−xj ||2

4t


||xi−xj ||2

4t .




u(x , t) =∫

MHt(x , y)f (y)dy ,



4t (φ(x , y) + O(t)),

for certain φ is a smooth function with φ(x , x) = 1.I It can be shown that, for x1, · · · , xk ∈ M and t > 0 small,

Lf (xi) ≈1t

f (xi)−∑

0<||xi−xj ||2<ε e−||xi−xj ||

2

4t f (xj)∑0<||xi−xj ||2<ε e−

||xi−xj ||2

4t


||xi−xj ||2

4t .

ReferencesSlides and notebook available at juanitorduz.github.io

Mikhail Belkin and Partha Niyogi.Laplacian eigenmaps for dimensionality reduction and datarepresentation.Neural Computation, 15(6):1373–1396, 2003.

Mark Kac.Can one hear the shape of a drum?The American Mathematical Monthly, 73(4):1–23, 1966.

Steven Rosenberg.The Laplacian on a Riemannian Manifold: An Introductionto Analysis on Manifolds.London Mathematical Society Student Texts. CambridgeUniversity Press, 1997.

Documents

On Laplacian Eigenmaps for Dimensionality Reduction · On Laplacian Eigenmaps for Dimensionality Reduction Dr. Juan Orduz PyData Berlin 2018. Overview Introduction Warming Up The