Book of Abstracts Workshop on Manifold Learning Bonn, 30 ... · Thu, 16:00-16:45 Krahmer, New and improved Johnson-Lindenstrauss embed-dings via the Restricted Isometry Property Fri,

HIM Trimester Program onAnalysis and Numerics for High Dimensional Problems

Book of Abstracts

Workshop on Manifold Learning

Bonn, 30 May – 3 June

Organizers: Jochen Garcke and Michael Griebel

Institute for Numerical SimulationRheinische Friedrich-Wilhelms-Universität Bonn

In the field of manifold learning, non-linear methodologies are investigated to effi-ciently describe high-dimensional data by lower dimensional structures. The researchis motivated by the observation made in many data driven research fields that a richstructure is present in the application data which can and needs to be exploited foran efficient representation.

In recent years several new machine learning algorithms were introduced whichallow such a nonlinear dimension reduction. They aim to exploit local structure andestimations of the intrinsic geometry, dimension, or topology. Theoretical insightsfrom topology allowed new methods for dimensionality estimation. New regularisationapproaches for classification and regression which take the geometry into account arealso closely related to manfold learning.

The workshop aims to bring together researchers interested in manifold learningand dimension reduction. Due to the diverse nature of the field this includes, but isnot limited to, people from machine learning, numerical mathematics, linear algebra,topology, geometry, or statistics.

Workshop Venue

The workshop is hosted by the Hausdorff Research Institute for Mathematics, Univer-sitat Bonn, http://www.hausdorff-research-institute.uni-bonn.de in cooper-ation with the Institute for Numerical Simulation http://www.ins.uni-bonn.de.

1

http://www.hausdorff-research-institute.uni-bonn.de

http://www.ins.uni-bonn.de

Conference dinner

Wednesday 1st June: Conference dinner at the restaurant Im Stiefel, Bonngasse 30,(close to the Beethoven-Haus, directly in the city center), starting at 19:00. You willbe charged for the dinner directly by the restaurant.

Excursion

Wednesday 1st June: tba.

Organisers

• Dr Jochen Garcke

• Prof. Michael Griebel

Acknowledgement

The workshop is supported by the Hausdorff Research Institute for Mathematics(HIM) as part of its Trimester Program on Analysis and Numerics for High Dimen-sional Problems. We thank the HIM-Team for their assistance.

Special thanks go to the members of the Institute for Numerical Simulation at theUniversity of Bonn, in particular Dr Christian Rieger, for doing a major part of thelocal organization of the workshop.

2

Tim

eM

ond

ayT

ues

day

Wed

nes

day

Thu

rsd

ayF

rid

ay

09:3

0-10

:30

La w

ren

ceH

ein

Gor

ban

Zh

ang

coff

ee

11:1

5-12

:00

Pap

rotn

yR

iege

rG

uill

emar

dP

enn

ec

@12

:00

Wis

sel

(till

12:1

5)

lun

ch

14:1

5-15

:15

Lee

Ku

shn

irso

cial

Kir

by

cake

pro

gram

cake

16:0

0-16

:45

Hu

llman

nIz

aT

eran

@19

:00

din

ner

Kra

hm

er

3

Detailed program

Mon, 14:00–14:15 Garcke, Opening

Mon, 14:15-15:15 Lee, Unsupervised Dimensionality Deduction: from PCA torecent nonlinear techniques

Mon, 16:00-16:45 Hullmann, The Generative Topographic Mapping for dimen-sionality reduction and data analysis

Tue, 09:30-10:30 Lawrence, A unifying probabilistic perspective on spectral ap-proaches to dimensionality reduction

Tue, 11:15-12:00 Paprotny, An Asymptotic Convergence Result for MaximumVariance Unfolding Based on an Interpretation as a Regular-ized Shortest Path Problem

Tue, 14:15-15:15 Kushnir, Anisotropic Diffusion Maps with Applications to In-verse Problems

Tue, 16:00-16:45 Iza Teran, Diffusion Maps for Finite-Element Simulation Data

Wed, 09:30-10:30 Hein, Nonlinear Eigenproblems in Machine Learning

Wed, 11:15-12:00 Rieger, Sampling Inequalities and Manifold Regularization

Wed, 12:00-12:45 Wissel, Fast Gauss Transforms for high-dimensional problems

Thu, 09:30-10:30 Gorban, Principal graphs and topological grammars for dataapproximation

Thu, 11:15-12:00 Guillemard, New Perspectives in Signal Processing CombiningDimensionality Reduction and Persistent Homology

Thu, 14:15-15:15 Kirby, Geometry and the Analysis of Massive Data Sets

Thu, 16:00-16:45 Krahmer, New and improved Johnson-Lindenstrauss embed-dings via the Restricted Isometry Property

Fri, 09:30-10:30 Zhang, Spectral Analysis of Alignment Matrices in ManifoldLearning

Fri, 11:15-12:15 Pennec, Current Issues in Statistical Analysis on Manifolds forComputational Anatomy

4

Authors and Participants

Garcke, 2, 4, 10Gorban, 3, 4, 16Griebel, 2Guillemard, 3, 4, 17

Hein, 3, 4, 13Hullmann, 3, 4, 8

Iza Teran, 3, 4, 12

Kirby, 3, 4, 18Krahmer, 3, 4, 19Kushnir, 3, 4, 11

Lawrence, 3, 4, 9Lee, 3, 4, 6

Paprotny, 3, 4, 10Pennec, 3, 4Pennec , 21

Rieger, 3, 4, 14

Wissel, 3, 4, 15

Zhang, 3, 4, 20

5

Monday, 16.05.2011

Unsupervised Dimensionality Deduction: from PCA to recentnonlinear techniques

Mon, 14:15-15:15

John A. Lee1 and Michel Verleysen2

1 Molecular Imaging, Radiotherapy, and OncologyInstitut de Recherche et d’etude Clinique (IREC)

Universite catholique de Louvain, Bruxelles, [email protected] Machine Learning Group

Institut ICTEAMUniversite catholique de Louvain, Louvain-la-Neuve, Belgium

[email protected]

Dimensionality reduction is an old and yet unsolved problem, with many appli-cations in data visualization, knowledge discovery, and machine learning in general.Our aim in this talk will be to review several developments in the field of dimension-ality reduction, with a particular focus on nonlinear methods. As an introduction, wewill point out some weird properties of high dimensional spaces, which will motivatethe use of dimensionality reduction. Next, we will go back in time and start ourreview with a short reminder about well-known techniques such as principal compo-nent analysis and multidimensional scaling. Our travel into time will also bring usto visit Sammon mapping and other methods based on distance preservation. Next,we will come across self-organizing maps and auto-encoders with bottleneck neuralnetworks. Some spectral methods such as Isomap, locally linear embedding, Lapla-cian eigenmaps, and maximum variance unfolding will be reviewed as well. A glanceat recent methods based on similarity preservation such as stochastic neighbour em-bedding will close the survey. In particular, we will show how these methods copewith the phenomenon of distance concentration. Finally, we will try to identify therelationships between the different approaches, and say a few words about qualitycriteria for dimensionality reduction techniques.

References :[1] I. Jolliffe, Principal Component Analysis, Springer, 1986.[2] I. Borg, P. Groenen, Modern multidimensional scaling, Springer, 2005.[3] J. Lee, M. Verleysen, Nonlinear dimensionality reduction, Springer, 2007.[4] J. Sammon, A nonlinear mapping algorithm for data structure analysis, IEEETransactions on Computers CC-18(5) (1969) 401–409.[5] P. Demartines, J. Herault, Curvilinear component analysis: A self-organizingneural network for nonlinear mapping of data sets, IEEE Transactions on NeuralNetworks 8 (1) (1997) 148–154.[6] M. Kramer, Nonlinear principal component analysis using autoassociative neuralnetworks, AIChE Journal 37(2) (1991) 233–243.[7] T. Kohonen, Self-organization of topologically correct feature maps, BiologicalCybernetics 43(1982) 59–69.[8] B. Scholkopf, A. Smola, K.-R. Muller, Nonlinear component analysis as a kerneleigenvalue problem, Neural Computation 10 (1998) 1299–1319.[9] J. Tenenbaum, V. de Silva, J. Langford, A global geometric framework fornonlinear dimensionality reduction, Science 290 (5500) (2000) 2319–2323.[10] S. Roweis, L. Saul, Nonlinear dimensionality reduction by locally linearembedding, Science 290 (5500) (2000) 2323–2326.

6

[11] K. Weinberger, L. Saul, Unsupervised learning of image manifolds bysemidefinite programming, International Journal of Computer Vision 70 (1) (2006)77–90.[12] M. Brand, K. Huang, A unifying theorem for spectral embedding andclustering, in: C. Bishop, B. Frey (Eds.), Proceedings of International Workshop onArtificial Intelligence and Statistics (AISTATS’03), 2003.[13] L. Xiao, J. Sun, S. Boyd, A duality view of spectral methods for dimensionalityreduction, in: Proceedings of the 23rd International Conference on MachineLearning, Pittsburg, PA, 2006, pp. 1041–1048.[14] G. Hinton, S. Roweis, Stochastic neighbor embedding, in: S. Becker, S. Thrun,K. Obermayer (Eds.), Advances in Neural Information Processing Systems (NIPS2002), Vol. 15, MIT Press, 2003, pp. 833–840.[15] L. vander Maaten, G. Hinton, Visualizing data using t-SNE, Journal of MachineLearning Research 9 (2008) 2579–2605.[16] J. Venna, J. Peltonen, K. Nybo, H. Aidos, S. Kaski, Information retrievalperspective to nonlinear dimensionality reduction for data visualization, Journal ofMachine Learning Research 11 (2010) 451–490.[17] D. Francois, V. Wertz, M. Verleysen, The concentration of fractional distances,IEEE Transactions on Knowledge and Data Engineering 19 (7) (2007) 873–886.[18] D. Donoho, High-Dimensional Data Analysis: The Curse and Blessings ofDimensionality, aide-memoire for a lecture for the American Math. Society “Math.Challenges of the 21st Century” (2000).

7

The Generative Topographic Mapping for dimensionality re-duction and data analysis

Mon, 16:00-16:45

Alexander HullmannInstitute for Numerical Simulation

Bonn University

[email protected]

Most high-dimensional data exhibit some correlation structure, which means thatthey are not distributed uniformly in the data space and have a low intrinsic dimen-sion. The Principal Component Analysis (PCA) is the best-known method to findlow-dimensional parametrisations of high-dimensional data. As it fails to capturenon-linear dependencies, several more complex methods have been invented.

The Generative Topographic Mapping (GTM) is a method for non-linear di-mensionality reduction, which has been published in 1998 by Bishop, Williams andSvensen. We will show how a continuous formulation of the GTM allows for the useof efficient quadrature methods and discretisations.

Instead of the usual full grid discretisation of the mapping from the embedding tothe data space, we use sparse grids to break the ‘curse of dimensionality’ to some ex-tent. Alternatively, we restrict the discretisation to low-order ANOVA-terms, whichremoves any exponential dependence of the runtime complexity on the embedding di-mension. From a statistical perspective, this model exploits the independence betweendifferent groups of data space dimensions, yielding a method that is more powerfulthan the PCA but significantly faster that the original GTM. As a third possibility,we discuss the application of the representer theorem on the discretisation of ourmapping.

As it has been shown, the Eucledian metric may not be optimal in high dimensions,which is related to the ‘concentration of measure’-effect. We describe a GTM based onthe p-Minkowski-Norm, and show that its practicality depends on the noise the dataexhibit. Furthermore, we will use the GTM not only for dimensionality reduction,but also for classification problems. This allows for demonstrative experiments thatstudy effects of the intrinsic dimension of high-dimensional data.

8

Tuesday, 31.05.2011

A unifying probabilistic perspective on spectral approaches todimensionality reduction

Tue, 09:30-10:30

Neil LawrenceUniversity of Sheffield

[email protected]

Spectral approaches to dimensionality reduction typically reduce the dimensional-ity of a data set through taking the eigenvectors of a Laplacian or a similarity matrix.Classical multidimensional scaling also makes use of the eigenvectors of a similaritymatrix. In this talk we introduce a maximum entropy approach to designing this sim-ilarity matrix. The approach is closely related to maximum variance unfolding. Otherspectral approaches such as locally linear embeddings and Laplacian eigenmaps alsoturn out to be closely related. Each method can be seen as a sparse Gaussian graphicalmodel where correlations between data points (rather than across data features) arespecified in the graph. This also suggests optimization via sparse inverse covariancetechniques such as the graphical LASSO. The hope is that this unifying perspectivewill allow the relationships between these methods to be better understood and willalso provide the groundwork for further research.

9

An Asymptotic Convergence Result for Maximum Variance Un-folding Based on an Interpretation as a Regularized ShortestPath Problem

Tue, 11:15-12:00

Alexander Paprotny(joint work with Jochen Garcke)

Technische Universitat BerlinInstitut fur Mathematik

[email protected]

We study an equivalent formulation of Maximum Variance Unfolding in termsof distance rather than Gram matrices. This yields a novel interpretation of theMVU problem as a regularized version of the shortest path problem on a graph.This interpretation enables an asymptotic convergence result for the case that theunderlying data are drawn from Riemannian manifold which is isometric to a convexsubset of Euclidean space.

10

Anisotropic Diffusion Maps with Applications to Inverse Prob-lems

Tue, 14:15-15:15

Dan KushnirApplied Mathematics, Yale University

[email protected]

In this talk I will present an efficient method for computing an extendable re-parameterization of high dimensional stochastic data sets generated by nonlinearmixing of independent parameters. Our method relies on the spectral decomposi-tion of a particular diffusion kernel constructed from observed data. According toSturm-Liouville oscillation theory, a subset of the kernel eigenvectors are monotonicfunctions of the original parameters. Thus, a re-parameterization via these eigenvec-tors provides an inverse map back to the space of physically meaningful parameters.We further show how the inverse mapping can be efficiently extended to newly ob-served data, and how this extension enhances the performance of data classification.We demonstrate the use of our method for solving empirically inverse problems suchas the reconstruction of geological formations from electro-magnetic measurements,and source localization in Acoustics.

11

Diffusion Maps for Finite-Element Simulation Data

Tue, 16:00-16:45

Rodrigo Iza TeranFraunhofer Institute for Scientific Computing

[email protected]

We present the application of a nonlinear dimension reduction method, diffu-sion maps, to find a low dimensional description of several hundred finite elementsimulations. This work is motivated by engineering problems in virtual product de-sign, where the product physical behaviour, is modeled using a very detailed modelthat depends on many design variables. A cost effective configuration fullfilling arequired functionality (i.e provide passengers security) and requirements (i.e securityconstraints) is sought, based on engineering judgement of many trial simulations ob-tained by varying the corresponding design variables. This process is sequential andtime consuming. We propose a way to organize and parametrize the information frommany trial configurations along low dimensional representations. This parametriza-tion is demonstrated to be very useful for the task of comparing several hundred trialconfigurations simultaneously. We present the application of the method to severalindustry examples in the areas of metal forming, crash and vibration analysis. It isshown that the application of such a dimension reduction in the proposed way has thepotential of significantly improving the standard way of doing, finite element based,virtual product development.

12

Wednesday, 01.06.2011

Nonlinear Eigenproblems in Machine Learning

Wed, 09:30-10:30

Matthias HeinUniversitat des Saarlandes

[email protected]

Many problems in data analysis can be formulated as (generalized) eigenproblems.In this talk I will discuss nonlinear eigenproblems, which allow extended modelingfreedom compared to linear eigenproblems in particular concerning robustness andsparsity. After an introduction of the framework and the discussion of an efficientgeneralization of the inverse power method, two examples will be discussed in moredetail. First, spectral clustering based on the nonlinear 1-graph Laplacian, wherewe could show that the second eigenvalue of the 1-Laplacian is equal to the optimalCheeger cut and as a second example sparse PCA.

13

Sampling Inequalities and Manifold Regularization

Wed, 11:15-12:00

Christian RiegerInstitut fur Numerische Simulation

Universitat Bonn

[email protected]

Sampling inequalities quantify the observation that a smooth function is globallysmall if it produces small discrete data. Inequalities of this kind have been usedto derive a priori error estimates for various regularized approximation problems inmany machine learning algorithms and PDE solvers. We demonstrate how the generalprinciples can be applied to manifold regularization. Here data embedded into ahigh-dimensional space is supposed to live on a lower-dimensional submanifold andunlabeled data is used to identify the manifold. We present an error analysis for thisproblem when interpreted as regularized reconstruction method.

14

Fast Gauss Transforms for high-dimensional problems

Wed, 12:00-12:45

Daniel WisselInstitut fur Numerische Simulation, Universitat Bonn

[email protected]

The Gauss transform is an important tool in many areas with applications in im-age manipulation, option pricing and data mining including classification, regressionand density estimation. The discret Gauss transform (DGT) in d-dimensional spacecorresponds to a matrix-vector multiplication involving O(N ·M ·d) operations, whereN is the number of source points and M the number of evaluation points. There existseveral approximation algorithms for the DGT based on the Fast Multipole Method(FMM) that reduce the complexity to O((N + M)

(p+dd

)) with a series truncation

parameter p. We propose two new algorithms. The first one seeks to combine the ad-vantages of the existing algorithms to achieve a competitive performance for variousparameter settings and dimensions up to 10 and above. The second algorithm aimsto exploit local low-dimensional features in the data to further reduce the runtimecomplexity and break the curse of dimensionality to a certain extent. Numerical re-sults with synthetic as well as real-world data reveal the strengths and drawbacks ofthe current algorithms.

15

Thursday, 02.06.2011

Principal graphs and topological grammars for data approxi-mation

Thu, 09:30-10:30

A.N. Gorban1 and A.Y. Zinovyev2

1 University of Leicester, Leicester, [email protected]

2 Institut Curie, Paris, France

Revealing geometry and topology in a finite dataset is an intriguing problem. Wepresent several methods of non-linear data modelling and construction of principalmanifolds and principal graphs. These methods are based on the metaphor of elastic-ity (the elastic principal graph approach). The elastic energy functionals are quadraticand, hence, the computational procedures are not very expensive. The simplest al-gorithms have the classical expectation/maximization (or splitting) structure. Forthe complexity control, several types of complexity are introduced. Construction ofprincipal graphs with controlled complexity is based on the graph grammar approachand on the idea of pluriharmonic graphs as ideal approximate objects. We presentseveral applications for microarray analysis and visualization of various datasets fromgenomics, medical and social research. The GIS-inspired methods of datasets car-tography are used. In particular, we demonstrate estimation and visualization ofuncertainty.

1. A. N. Gorban, A. Zinovyev, Principal manifolds and graphs in practice: frommolecular biology to dynamical systems, International Journal of Neural Systems,Vol. 20, No. 3 (2010) 219-232. http://arxiv.org/abs/1001.11222. A. N. Gorban, A. Y. Zinovyev, Principal Graphs and Manifolds, Ch. 2 In:Handbook of Research on Machine Learning Applications and Trends, IGI Global,Hershey, PA, USA, 2009, pp. 28-59. http://arxiv.org/abs/0809.0490

16

New Perspectives in Signal Processing Combining Dimension-ality Reduction and Persistent Homology

Thu, 11:15-12:00

Mijail Guillemard(joint work with Armin Iske)

Universitat [email protected]

In the last few years we have seen unprecedented developments of new tools for theanalysis of point cloud datasets. Dimensional reduction and manifold learning are bynow well established research fields with challenging tasks and important applicationsin engineering problems. Parallel developments for the analysis of datasets haverecently appeared in computational topology, where a new emergent topic is persistenthomology. In this talk, we first introduce background information before discussingnew perspectives for applying these tools in signal analysis. We discuss an illustrativeframework for a topological based filtering method in signal processing, as well as atoy example in image analysis.

17

Geometry and the Analysis of Massive Data Sets

Thu, 14:15-15:15

Michael KirbyColorado State University

[email protected]

Several algorithms will be presented that exploit geometric ideas for character-izing large data sets. We consider two distinct cases. In the first setting, the datamay be viewed as residing on a manifold, or multiple manifolds. In the second casewe characterize the data in terms of subspaces and are concerned with their rep-resentation in terms of Grassmann, Stiefel or flag manifolds. The presentation willemphasize practical algorithms and a broad range of applications including recentwork in video-to-text.

18

New and improved Johnson-Lindenstrauss embeddings via theRestricted Isometry Property

Thu, 16:00-16:45

Felix Krahmer(joint work with Rachel Ward)

Hausdorff Center for Mathematics, Universitat [email protected]

The Johnson-Lindenstrauss (JL) Lemma states that any set of p points in highdimensional Euclidean space can be embedded into O(δ−2 log(p)) dimensions, withoutdistorting the distance between any two points by more than a factor between 1 − δand 1 + δ . We establish a new connection between the JL Lemma and the RestrictedIsometry Property (RIP), a well-known concept in the theory of sparse recovery oftenused for showing the success of `1-minimization.

Consider anm×N matrix satisfying the (k, δk)-RIP and an arbitrary set E ofO(ek)points in RN . We show that with high probability, such a matrix with randomizedcolumn signs maps E into Rm without distorting the distance between any two pointsby more than a factor of 1 ± 4δk. Consequently, matrices satisfying the RestrictedIsometry of optimal order provide optimal Johnson-Lindenstrauss embeddings up toa logarithmic factor in N . Moreover, our results yield the best known bounds onthe necessary embedding dimension m for a wide class of structured random matri-ces. In particular, for partial Fourier and partial Hadamard matrices, our methodoptimizes the dependence of m on the distortion δ: We improve the recent boundm = O(δ−4 log(p) log4(N)) of Ailon and Liberty (2010) to m = O(δ−2 log(p) log4(N)),which is optimal up to the logarithmic factors in N . Our results also have a directapplication in the area of compressed sensing for redundant dictionaries.

19

Friday, 03.06.2011

Spectral Analysis of Alignment Matrices in Manifold Learning

Fri, 09:30-10:30

Zhenyue ZhangDept. of Math, Zhejiang University

[email protected]

Local methods for manifold learning generate a collection of local parameteriza-tions which is then aligned to produce a global parameterization of the underlyingmanifold. The alignment procedure is carried out through the computation of a par-tial eigendecomposition of a so-called alignment matrix from a finite number of data.In this talk, we will provides insights into the behaviors and performance of localmanifold learning algorithms. We will present an analysis of the eigenstructure of thealignment matrix, including 1) error estimates of the data-resulted alignment matrixfrom an underlying alignment matrix of the parameters (the ideal alignment matrixwithout errors), 2) necessary and sufficient conditions under which the eigen-subspaceof the ideal alignment matrix recovers the global parameterization, 3) an estimationof the gap of the null space of the alignment matrix of parameters, deriving a quan-titative measure of how stably the null space can be computed numerically.

20

Current Issues in Statistical Analysis on Manifolds for Compu-tational Anatomy

Fri, 11:15-12:15

Xavier PennecAsclepios project-team, INRIA Sophia-Antipolis

F-06902 Sophia-Antipolis Cedex, [email protected]

Computational anatomy aims at analyzing and modeling the biological variabilityof the human anatomy. The method is to identify anatomically representative ge-ometric features (points, tensors, curves, surfaces, volume transformations), and todescribe and compare their statistical distribution in different populations. As thesegeometric features most often belong to manifolds that have no canonical Euclideanstructure, we have to rely on more elaborated algorithmical bases.

I will first describe the Riemannian structure, which proves to be powerfull todevelop a consistent framework for simple statistics on manifolds. It can be furtherextend to a complete computing framework on manifold-valued images. For instance,the choice of a convenient Riemannian metric on symmetric positive define matrices(SPD) allows to generalize consistently to fields of SPD matrices (e.g. DTI images)many important geometric data processing algorithms.

Then I will focus on more complex features such as curves and surfaces, whichraises the problem of infinite dimensional manifolds. I will present here the approachdeveloped by Stanley Durrleman during his PhD. This is a generative model combin-ing a random diffeomorphic deformation model a al Grenander & Miller, that encodesthe geometric variability of the anatomical template, with a random residual shapevariability model (a la Kendall). One of the specificity is the encoding of curves, setsof curves and surfaces using currents (from the geometric measure theory) providedwith a Kernel metric. This leads to an interesting robust and computatable distancebetween surfaces. Example applications with include the building of a statisticalmodel of the remodeling of the heart in rToF. Then we extend this model to longi-tudinal models of growth and evolution by combining a time diffeomorphism with astatic space diffeomorphism that model the inter-subject or inter-species variability.We illustrate framework with recent results on the shape of the endocast of monksclosely related to humans.

21

Notes

22

Documents

Book of Abstracts Workshop on Manifold Learning Bonn, 30 ... · Thu, 16:00-16:45 Krahmer, New and improved Johnson-Lindenstrauss embed-dings via the Restricted Isometry Property Fri,