PNAS-2013-Aflalo-18052-7

  • Upload
    birlas

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

  • 7/26/2019 PNAS-2013-Aflalo-18052-7

    1/6

    Spectral multidimensional scalingYonathan Aaloa,1 and Ron Kimmelb

    Departments of aElectrical Engineering and bComputer Science, TechnionIsrael Institute of Technology, Haifa 32000, Israel

    Edited* by Haim Brezis, Rutgers, The State University of New Jersey, Piscataway, NJ, and approved September 10, 2013 (received for review May 8, 2013)

    An important tool in information analysis is dimensionality reduc-

    tion. There are various approaches for large data simplication byscalingits dimensions down that playa signicantrole in recognitionand classication tasks. The efciency of dimension reduction tools is

    measured in terms of memory and computational complexity, which

    are usually a function of thenumber of the given data points. Sparselocal operators that involve substantially less than quadratic com-

    plexity at one end,and faithful multiscale models with quadratic cost

    at the other end, make the design of dimension reduction procedure

    a delicate balance between modeling accuracy and efciency. Here,

    we combine the benets of both and propose a low-dimensional

    multiscale modeling of the data, at a modest computational cost.The idea is to project the classical multidimensional scaling problem

    into the data spectral domain extracted from its LaplaceBeltrami

    operator. There, embedding into a small dimensional Euclidean space

    is accomplished while optimizing for a small number of coefcients.

    We provide a theoretical support and demonstrate that working inthe natural eigenspace of the data, one could reduce the process

    complexity while maintaining the model delity. As examples, we

    efciently canonize nonrigid shapes by embedding their intrinsic

    metric into R3, a method often used for matching and classifying

    almost isometric articulated objects. Finally, we demonstrate themethod by exposing the style in which handwritten digits appear

    in a large collection of images. We also visualize clustering of digits

    by treating images as feature points that we map to a plane.

    at embedding| distance maps| big data | diffusion geometry

    Manifold learning refers to the process of mapping given datainto a simple low-dimensional domain that reveals proper-ties of the data. When the target space is Euclidean, the procedureis also known as attening. The question of how to efciently andreliablyatten given data is a challenge that occupies the minds ofnumerous researchers. Distance-preserving data-attening pro-cedures can be found in the elds of geometry processing, themapmaker problem through exploration of biological surfaces(13), texture mapping in computer graphics (4, 5), nonrigid shapeanalysis (6), image (79) and video understanding (7, 10), andcomputational biometry (11), to name just a few. The at em-bedding is usually a simplication process that aims to preserve, asmuch as possible, distances between data points in the originalspace, while being efcient to compute. One family ofatteningtechniques is multidimensional scaling (MDS), which attempts tomap all pairwise distances between data points into small di-

    mensional Euclidean domains. Review of MDS applications inpsychophysics can be found in ref. 12, which includes the com-putational realization that human color perception is 2D.

    A proper way to explore the geometry of given data pointsinvolves computing all pairwise distances. Then, a atteningprocedure should attempt to keep the distance between all cou-ples of corresponding points in the low dimensional Euclideandomain. Computing pairwise distances between points wasaddressed by extension of local distances on the data graph (1, 13),or by consistent numerical techniques for distance computationon surfaces (14). The complexity involved in storing all pairwisedistances is quadratic in the number of data points, which is alimiting factor in large databases. Alternative models try to keepthe size of the input as low as possible by limiting the input todistances of just nearby points.

    One such example is locally linear embedding (15), which

    attempts to map data points into a at domain where each featurecoordinate is a linear combination of the coordinates of its neigh-bors. The minimization, in this case, is for keeping the combina-tion similar in the given data and the target a at domain. Becauseonly local distances are analyzed, the pairwise distances matrix issparse, with an effective size ofOn, where n is the number ofdata points. Along the same line, the Hessian locally linear em-bedding (16) tries to better t the local geometry of the data to theplane. Belkin and Niyogi (17) suggested embedding data pointsinto the LaplaceBeltrami eigenspace for the purpose of dataclustering. The idea is, yet again, to use distances of only nearbypoints by which a LaplaceBeltrami operator (LBO) is dened.The data can then be projected onto the rst eigenvectors thatcorrespond to the smallest eigenvalues of the LBO. The questionof how to exploit the LBO decomposition to construct diffusion

    geometry was addressed by Coifman and coworkers (18, 19).De Silva and Tenenbaum (20) recognized the computational

    difculty of dealing with a full matrix of pairwise distances, andproposed working with a subset of landmarks, which is used forinterpolation in the embedded space. Next, Bengio et al. (21)proposed to extend subsets by Nystrm extrapolation, withina space that is an empirical Hilbert space of functions obtainedthrough kernels that represent probability distributions; theydid not incorporate the geometry of the original data. Finally,assumingn data points, Asano et al. (22) proposed reducing bothmemory and computational complexity by subdividing the prob-lem into O ffiffiffinp subsets of O ffiffiffinp data points in each. Thatreduction may be feasible when distances between data points canbe computed in constant time, which is seldom the case.

    The computational complexity of multidimensional scaling

    was addressed by a mult igrid approach in ref. 23, and vectorextrapolation techniques in ref. 24. In both cases the acceler-ation, although effective, required all pairwise distances, anOn2

    Signicance

    Scientic theories describe observations by equations with asmall number of parameters or dimensions. Memory and com-

    putational efciency of dimension reduction procedures is a

    function of the size of the observed data. Sparse local oper-

    ators that involve almost linear complexity, and faithful multi-

    scale models with quadratic cost, make the design of dimension

    reduction tools a delicate balance between modeling accuracyand efciency. Here, we propose multiscale modeling of the

    data at a modest computational cost. We project the classicalmultidimensional scaling problem into the data spectral domain.

    Then, embedding into a low-dimensional space is efciently ac-complished. Theoretical support and empirical evidence demon-

    strate that working in the natural eigenspace of the data, one

    could reduce the complexity while maintaining model delity.

    Author contributions: Y.A. and R.K. designed research and wrote the paper.

    The authors declare no conict of interest.

    *This Direct Submission article had a prearranged editor.

    Freely available online through the PNAS open access option.

    See Commentary on page 18031.

    1To whom correspondence should be addressed. E-mail: [email protected].

    This article contains supporting information online atwww.pnas.org/lookup/suppl/doi:10.

    1073/pnas.1308708110/-/DCSupplemental.

    1805218057 | PNAS | November 5, 2013 | vol. 110 | no. 45 www.pnas.org/cgi/doi/10.1073/pnas.1308708110

    http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1308708110/-/DCSupplementalhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1308708110/-/DCSupplementalhttp://www.pnas.org/cgi/doi/10.1073/pnas.1308708110http://www.pnas.org/cgi/doi/10.1073/pnas.1308708110http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1308708110/-/DCSupplementalhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1308708110/-/DCSupplemental
  • 7/26/2019 PNAS-2013-Aflalo-18052-7

    2/6

    input. An alternative approach to deal with the memory com-plexity was needed.

    In ref. 25, geodesics that are suspected to distort the embeddingdue to topology effects were ltered out in an attempt to reducedistortions. Such lters eliminate some of the pairwise distances,

    yet not to the extent of substantial computational or memorysavings, because the goal was mainly reduction ofattening dis-tortions. In ref. 26, the eigenfunctions of the LBO on a surface

    were interpolated into the volume bounded by the surface. This

    procedure was designed to overcome the need to evaluate theLBO inside the volume, as proposed in ref. 27. Both models dealwith either surface or volume LBO decomposition of objects in R3,and were not designed toatten and simplify structures in big data.

    Another dimensionality reduction method, intimately relatedto MDS, is the principal component analysis method (PCA). ThePCA procedure projects the points to a low-dimensional spaceby minimizing the least-square tting error. In ref. 28, that re-lation was investigated through kernel PCA, which is related tothe problem we plan to explore.

    Here, we use the fact that the gradient magnitude of distancefunctions on the manifold is equal to 1 almost everywhere. In-terpolation of such smooth functions could be efciently ob-tained by projecting the distances into the LBO eigenspace. Infact, translating the problem into its natural spectral domainenables us to reduce the complexity of the attening procedure.The efciency is established by considering a small fraction ofthe pairwise distances that are projected onto the LBO leadingeigenfunctions.

    Roughly speaking, the differential structure of the manifold is cap-tured by the eigenbasis of the LBO, whereas its multiscale structureis encapsulated by sparse sampling of the pairwise distances matrix.We exemplify this idea by extracting the R4 structure of points inR

    10;000, canonizing surfaces to cope with nonrigid deformations, andmapping images of handwritten digits to the plane.

    The proposed framework operates on abstract data in anydimension. In the rst section, we prove the asymptotic behaviorof using the leading eigenvectors of the LBO for representingsmooth functions on a given manifold. Next, we demonstratehow to interpolate geodesic distances by projecting a fraction ofthe distances between data points into the LBO eigenspace. Wenext formulate the classical MDS problem in the LBO eigen-space. A relation to diffusion geometry (18, 19, 29) is established.Experimental results demonstrate the complexity savings.

    Spectral Projection

    Let us consider a Riemannian manifold,M, equipped with ametric G. The manifold and its metric induce a LBO, oftendenoted byG, and here w.l.o.g. by. The LBO is self-adjoint anddenes a set of functions called eigenfunctions, denoted i, suchthat i =ii and

    RMixjxdax= ij, wheredais an area el-

    ement. These functions have been popular in the eld of shapeanalysis; they have been used to construct descriptors (30) and lin-early relate between metric spaces (31). In particular, for problemsinvolving functions dened on a triangulated surface, when thenumber of vertices of

    Mis large, we can reduce the dimensionality

    of the functional space by considering smooth functions dened ina subspace spanned by just a couple of eigenfunctions of the asso-ciated LBO. To prove the efciency of LBO eigenfunctions inrepresenting smooth functions, consider a smooth function fwithbounded gradient magnitudekGfkG. Dene the representa-tion residual function rn =f

    Pni= 1

    hf;iiGi. It is easy to seethat i; 1 in; hrn; iiG = 0, and i; 1 in; hGrn;GiiG =hrn;GiiG =ihrn;iiG = 0. Using these properties and assumingthe eigenvalues are ordered in ascending order,1 2. . ., we read-

    ily have thatkrnk2G =P

    i=n+1hrn;iiGi

    2G=P

    i =n+ 1 hrn; ii2G

    andkGrnk2G =P

    i=n+1hrn;iiGGi

    2Gn+1

    Pi=n +1

    hrn; ii2G.

    Then,kGrnk2G

    krn

    k2G

    n+1. Moreover, as for i f1; ::;ng, the inner

    product vanishes,hGrn;GiiG = 0, we havekGfk2G =Grn+Pn

    i=1hf;iiGGi

    2G= kGrnk2G +

    Pni= 1

    hf; ii2Gi. It foll ows

    thatkrnk2G kGrnk2Gn+1

    kGfk2Gn+1

    .

    Finally, for d dimensional manifolds, as shown in ref. 32, thespectra has a linear behavior in n

    2d, that is, n C1n

    2d a s n.

    Moreover, the residual functionrn depends linearly onkGfk,which is bounded by a constant. It follows that rn converges

    asymptotically to zero at a rate of On2d. Moreover, thisconvergence depends onkGfk2G, i.e., C2, such that

    f :MMkrnk2G C2kGfk2G

    n2

    d

    : [1]

    Spectral Interpolation

    Let us consider a ddimensional manifold,M, sampled atnpointsfVig, andJa subset off1; 2;. . .;ngsuch thatjJ j=ms n. Givena smooth functionfdened onVJ= fVj;jJg, our rst questionis how to interpolate f, that is, how to construct a continuousfunction ~fsuch that ~fVj=fVj; jJ. One simple solution islinear interpolation. Next, assume the function~fis required to beas smooth as possible in L2sense. We measure the smoothness ofa function by Esmoothf=

    RMkfk22da. The problem of smooth

    interpolation could be rewritten as minh:MR

    Esmoothh s.t. jJ;hVj=fVj. We readily have

    RMkhk22da=

    RMhh;hida. Our

    interpolation problem could then be written as

    minh:MR

    ZM

    hh;hida s:t: hVj=fVj jJ: [2]Any discretizatio n matrix of the LBO could be used for theeigenspace construction. Here, we adopt the LBO general L=

    A1Wform, where A1 is a diagonal matrix whose entries areinversely proportionalto the metric innitesimal volume elements.

    One example is the cotangent weights approximation (33) for tri-angulated surfaces; another example is the generalized discreteLBO suggested in ref. 17. In the rst case,Wis a matrix in whicheach entry is zero if the two vertices indicated by the two matrixindices do not share a triangles edge, or the sum of the cotan-gents of the angles supported by the edge connecting the corre-sponding vertices. In the latter case, W is the graph Laplacian,

    where the graph is constructed by connecting nearest neighbors.Like any Laplacian matrix, the diagonal is the negative sum ofthe off-diagonal elements of the corresponding row. A, in thetriangulated surface case, is a diagonal matrix in which theAiithelement is proportional to the area of the triangles about the

    vertexVi. A similar normalization factor should apply in the moregeneral case.

    A discrete version of the smooth interpolation problem 2 canbe rewritten as

    minx

    xTWx s:t: Bx =f; [3]

    where B is the projection matrix on the space spanned by thevectors ej;jJ, where ej is the jth canonical basis vector, and fnow represents the sampled vector fVJ.

    To reduce the dimensionality of the problem, we introduce f,the spectral projection of f to the set of eigenfunctions of the

    LBOfigmei=1 as f=Pme

    i = 1hf;iii, where i =ii. Then, by

    denoting the matrix whose ith column is i, we havef=,

    where is a vector such that i = hf;ii. Problem 3 can now beapproximated by

    Aalo and Kimmel PNAS | November 5, 2013 | vol. 110 | no. 45 | 18053

    APPLIED

    SEECOMMENTARY

  • 7/26/2019 PNAS-2013-Aflalo-18052-7

    3/6

    minRme

    TTW s:t: B=f: [4]

    By denition, TW=, where represents the diagonal ma-trix whose elements are the eigenvalues ofL. We can alterna-tively incorporate the constraint as a penalty in the target func-

    tion, and rewrite our problem as minRme

    T+kBfk2

    =

    minR

    me T

    +TBTB

    + 2fTB

    . The solution to this

    problem is given by

    = 2+TBTB

    1

    TBTf=Mf: [5]

    Next, we use the above interpolation expressions to formulate thepairwise geodesics matrix in a compact yet accurate manner. Thesmooth interpolation, problem 2, for a pairwise distances functioncan be dened as follows. In this setting, letI=JJbe the setof pairs of indices of data points, andFVi; Vja value dened foreach pair of points (or vertices in the surface example)Vi; Vj,

    wherei;jI. We would like to interpolate the smooth functionD :MMR, whose values are given atVi; Vj; i;jI, byD

    Vi; Vj

    =F

    Vi; Vj

    ;

    i;j

    I. For that goal, we rst dene

    a smooth-energy measure for such functions. We proceed as be-fore by introducing

    EsmoothD=Z ZM

    kxDx;yk2 +yDx;y2daxday:

    The smooth interpolation problem can now be written as

    minD:MMR

    EsmoothDs:t: D

    Vi; Vj

    =F

    Vi; Vj

    ;i;jI: [6]

    Given any matrixD , such that Dij =DVi; Vj, we haveZM

    xDx;yj2dax=ZM

    xD

    x;yj

    ;D

    x;yj

    dax

    DTj WDj = traceWDjD

    Tj

    ;

    where DTj represents the jth column ofD . Then,

    Z ZM

    kxDx;yk2daxdayX

    j

    AjjtraceWDjD

    Tj

    = trace

    W

    Xj

    DjDTj Ajj

    = trace

    WDADT

    :

    A similar result applies to RRMyDx;y2daxday. Then,Z ZM

    kxDx;yk2daxday traceDTWDA

    Z ZM

    yDx;y2daxday traceDWDTA: [7]

    The smooth energy can now be discretized for a matrixD by

    EsmoothD= traceDTWDA

    + trace

    DWDTA

    : [8]

    The spectral projection ofD onto , is given by

    ~Dx;y=X

    i

    hD ;y; iiix

    =

    Xni = 1

    Xnj= 1

    DhDu;v; iui;jv

    E|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl fflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflffl fflfflfflfflfflfflffl}

    ij

    jyix ;

    whereij =RRMMDx;yixjydaxday. In matrix notations,

    denoting byD the matrix such that Dij = ~Dxi;yj, we can show that

    D=T: [9]

    Combining Eqs.8and9, we can dene the discrete smooth energyof the spectral projection of a function as

    ~EsmoothD= traceTTWTA

    + trace

    TWTTA

    = trace

    TTA

    + trace

    TTA

    = trace

    TTA

    + trace

    TTA

    = trace

    T

    + trace

    T

    : [10]

    Using the spectral smooth representation introduced in Eq.9, wecan dene the smooth spectral interpolation problem for a func-tion fromMMto Ras

    minRme me traceT+ traceTs:t:

    T

    ij=D

    Vi; Vj

    ; i;jI: [11]

    Expressing the constraint again as a penalty function, we end upwith the following optimization problem

    minRme me

    traceT

    + trace

    T

    +

    Xi;jI

    TijD

    Vi; Vj

    2F

    ; [12]

    wherek kF represents the Frobenius norm, and me the numberof eigenfunctions. Problem 12 is a minimization problem of a qua-dratic function of. Then, representingas anm2e 1vector ,the problem can be rewritten as a quadratic programming prob-lem. We can nd a matrixM, similar to Eq. 5, such that =MD,

    where D is the row stack vector of the matrixDVi; Vj.Another, less accurate but more efcient, approach to obtain

    an approximation of the matrix is to compute

    =MFMT; [13]

    where F is the matrix dened by Fij =FVi; Vj and M is thematrix introduced in Eq. 5.

    Notice that spectral projection is a natural choice for thespectral interpolation of distances because the eigenfunctionsencode the manifold geometry, as do distance functions. More-over, the eikonal equation, which models geodesic distancefunctions on the manifold, is dened bykGDk= 1. Eq. 1, inthis case, provides us with a clear asymptotic convergence rateby spectral projection of the function D, becausekGDk is aconstant equal to 1.

    At this point, we have all of the ingr edients to present thespectral MDS. For simplicity, we limit our discussion to clas-sical scaling.

    Spectral MDS

    MDS is a family of dimensionality reduction techniques that at-tempt to nd a simple representation for a dataset given by thedistances between every two data points. Given any metric spaceMequipped with a metric D :MMR, andV= fV1; V2;. . . Vnga nite set of elements ofM, the multidimensional scaling ofVinR

    k involves nding a set of pointX= fX1;X2;. . .;Xngin Rk whosepairwise Euclidean distances dXi;Xj=

    Xi Xj

    2

    are as close aspossible toD

    Vi; Vj

    for all

    i;j. Such an embedding, for the MDS

    18054 | www.pnas.org/cgi/doi/10.1073/pnas.1308708110 Aalo and Kimmel

    http://www.pnas.org/cgi/doi/10.1073/pnas.1308708110http://www.pnas.org/cgi/doi/10.1073/pnas.1308708110
  • 7/26/2019 PNAS-2013-Aflalo-18052-7

    4/6

    member known as classical scaling, can be realized by the followingminimization program min

    X

    XTX12JDJF, where D is a matrixdened by Dij =DVi; Vj2 and J= In 1n 1n1Tn , or Jij = ij 1=n.Classical scaling (Eq. 12) nds the rstk singular vectors and cor-responding singular values of the matrix 12JDJ. The classicalscaling solver requires the computation of an nmatrix of dis-tances, which is a challenging task when dealing with more than,say, thousands of data points. The quadratic size of the input dataimposes severe time and space limitations. Several accelerationmethods for MDS were tested over the years (2024). However, tothe best of our knowledge, the multiscale geometry of the data wasnever used for lowering the space and time complexities during thereduction process. Here, we show how spectral interpolation, in the

    case of classical scaling, is used to overcome these complexitylimitations.In the spectral MDS framework we rst select a subset ofms

    points, with indicesJof the data. For efcient covering of thedata manifold, this set can be selected using the farthest-pointsampling strategy, which is known to be 2-optimal in the sense ofcovering. We then compute the geodesic distances between everytwo pointsVi; VjMM; i;jI=JJ.

    Now, we are ready to choose any Laplacian operator that isconstructed from the local relations between data points. TheLBO takes into consideration the differential geometry of thedata manifold. We compute Laplacians rst me eigenvectorsthat are arranged in an eigenbasis matrix denoted by. Finally,

    we extract the spectral interpolation matrixfrom the computedgeodesic distances and the eigenbasis using Eqs. 12 or 13.

    The interpolated matrix distance ~D between every two points

    ofMcan be computed by ~D=T. It is important to note thatthere is no need to compute this matrix explicitly.

    At this point let us justify our choice of representation. Denoteby Dy :MR+ the geodesic distance function from a sourcepointyto the rest of the point in the manifold. Geodesic distance

    functions are characterized by the eikonal equationGDy2 = 1.

    This property can be used in the bound provided by Eq. 1. Spe-

    cically,GD2y

    2 = 2DyGDy2 4Dy2. Plugging this re-lation to Eq. 1, the errorof thesquared geodesic distance function

    projected to the spectral basis is given bykrnk2GkDk2G

    4C2

    n2

    d

    , wherekDkGis the diameter of the manifold. We thereby obtained a bound onthe relative projection error that depends only on Weyls constantand the number of eigenvectors used in the reconstruction byprojection.

    Next, the spectral MDS solution is given by XXT= 12J

    ~DJ.

    This decomposition can be approximated by two alternativemethods. The rst method considers the projection ofX to theeigenbasis , which is equivalent to representing X by ~X=.To that end, we have to solve TT= 12J

    TJ. Because

    TA= Ime , we have

    T= 1

    2

    TAJ|fflfflfflfflffl{zfflfflfflfflffl}H

    TJA= 1

    2HHT; [14]

    which leads to the singular value decomposition of theme mematrix

    1

    2HHT.

    In the second method, to overcome potential distortions caused

    by ignoring the high-frequency components, one may prefer todecompose the matrixT using the algebraic trick presented inalgorithm 1.

    Fig. 1. Canonical forms of Armadillo (Left) and Ho-

    mer (Right). Within eachbox is (from left to right) the

    shape, followed by canonical forms obtained by reg-

    ular MDS, and spectral MDS.

    Fig. 2. Spectral MDS compared with MDS with the same number of sampled points (same complexity). Each point in the plane on the central frames

    represents a shape. The input to the MDS producing the position of the points was distances between the corresponding canonical forms (MDS applied to the

    result of MDS). Red points represent dogs, and blue points represent lionesses. The spectral MDS result (Right) provides better clustering and separation

    between classes compared with the regular MDS ( Left).

    Aalo and Kimmel PNAS | November 5, 2013 | vol. 110 | no. 45 | 18055

    APPLIED

    SEECOMMENTARY

  • 7/26/2019 PNAS-2013-Aflalo-18052-7

    5/6

    Algorithm 1: Spectral MDS

    Require: A data manifoldM densely sampled at n points,number of subsampled points ms, number of eigenvectors me.

    ComputeJ a subset ofms points sampled fromM.Compute the matrixD of squared geodesic distances between

    every two pointspi;pj; iJ;jJ.Compute the matrices ; containing the rst mthe eigen-

    vectors and eigenvalues of the LBO ofM.Compute the matrix according to 12 or 13.

    Compute the SVD decomposition of then me matrixJ,such that J=SUVT, where J= In

    1n

    1n1Tn .

    Compute the eigendecomposition of the me me matrixUVTVU, such that UVTVU=PPT.

    Compute the matrix Q=SP12 such that QQT=JTJ.Return: the rst k columns of the matrixQ.

    Relation to diffusion maps: An interesting at-embedding op-tion is known as diffusion maps (18, 29). Given a kernel functionK, the diffusion distance between two pointsx;yM is de-ned as

    D2x;y=X

    k

    kxky2Kk; [15]

    where k represents the kth eigenfunction ofM, and k its as-

    sociated eigenvalue. Note that Dij =D2

    xi;yj. In a matrix form,Eq.15readsDij =

    Pmek= 1

    ik jk2Kkk, where represents theeigendecomposition matrix of the LBO andKis a diagonal matrixsuch that Kkk =Kk.

    Denoting by, the matrix such that ij =2ij, it follows that

    D=K1me 1Tn + K1me1Tn T 2KT. Next, let us apply classi-

    cal scaling to D, which by itself denes a at domain. Be-cause 1TnJ= 0, then K1me 1

    TnJ= 0, and we readily have JDJ=

    2JKTJ=J2KTJ.Applying MDS to diffusion distances turns out to be nothing

    but setting = 2Kin the proposed spectral MDS framework.The spectral MDS presented in Eq. 14 and algorithm 1 can bedirectly applied to diffusion distances without explicit computationof these distances. Moreover, using data-trained optimal diffusion

    kernels, as those introduced in ref. 34, we could obtain a discrim-inatively enhanced at domain, in which robust and efcientclassication between different classes is realized as part of theconstruction of the at target space.

    Numerical Experiments

    The embedding of the intrinsic geometry of a shape into a Eu-clidean space is known as a canonical form. When the input to anMDS procedure is the set of geodesic distances between every twosurface points, the output would be such a form. These structures,introduced in ref. 6, are invariant to isometric deformations of theshape, as shown in SI Appendix.

    In our rst example, we explore two shapes containing 5,000vertices each for which we compute all pairwise geodesic dis-tances. Then, we select a subset of just 50 vertices, using thefarthest-point sampling strategy, and extract 50 eigenvectors of

    the corresponding LBO. Finally, we atten the surfaces into theircanonical forms using the spectral MDS procedure. A qualitativeevaluation is presented in Fig. 1 demonstrating the similarity ofthe results of classical scaling on the full matrix, and the spectralMDS operating on just 1% of the data.

    The relation of within classes and between classes for lionessesand dogs from the TOSCA database (35) is shown in Fig. 2.Spectral MDS allows us to consider more points and can thus serveas a better clustering tool.

    Next, we compute the spectral interpolation of the matrix ~D

    for sampled points from Armadillos surface. We use the samenumber of eigenvectors as that of sample points, and plot theaverage geodesic error as a function of the points we use. Themean relative error is computed by mean

    ~DDkD+ ek, andit is plotted as a function of the ratio between the number ofsample points and the total number of vertices of the shape (Fig.3, Left). Note that spectral interpolation of the geodesic dis-tances works as predicted. In this example, 5% of the distancesbetween the sample points provide us a distance map that ismore than 97% accurate.

    In the next experiment, we measured the inuence of thenumber of sampled points on the accuracy of the reconstruction.We rst generated several triangulations of a sphere, each ata different resolution reected by thenumber of triangles. Next, foreach such triangulation, we used spectral interpolation for calcu-

    lating the geodesic distances using just ffiffiffinp points, wheren repre-sents the number of vertices of the sphere at a given resolution.Fig. 3, Right shows the average geodesic error as a function ofn.The average error is a decreasing function of the number of points,in the case where

    ffiffiffin

    p points are used for the construction.

    Next, we report on an attempt to extract an R4 manifoldembedded in R10;000. We try to obtain the low-dimensionalhyperplane structure from a given set of 100,000 points in R10;000.The spectral MDS could handle 105 points with very little effort,

    whereas regular MDS had difculties already with 104 points.Being able to process a large number of samples is important

    when trying to deal with noisy data, as is often the case in big-data analysis. See MatLab code of this experiment in the SupportingInformation.

    Fig. 4 depicts the embedding of 5,851 images of handwritten

    digit 8 into a plane. The data are taken from the Modi

    ed Na-tional Institute of Standards and Technology (MNIST) database(36). We used a naive metric by which the distance between twoimages is equal to integration over the pixel-wise differencesbetween them. Even in this trivial setting, the digits are arrangedin the plane such that those on the right tilt to the right, andthose on the left slant to the left. Fat digits appear at the bottom,and thin ones at the top.

    102 101 10010

    3

    102

    101

    Relative number of sample points

    Averagerelativegeodesicerror

    102 103 104 10510

    2

    101

    100

    Number of vertices

    Averagerelativegeodesicerror

    A B

    Fig. 3. (A) Relative geodesic distortion as a function of the number of sample

    points for interpolatingD. (B) Geodesic error as a function of sample points.

    Fig. 4. Flat embedding of handwritten digit 8 from the MNIST database.

    18056 | www.pnas.org/cgi/doi/10.1073/pnas.1308708110 Aalo and Kimmel

    http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1308708110/-/DCSupplemental/sapp.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1308708110/-/DCSupplemental/sapp.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1308708110/-/DCSupplemental/sapp.pdfhttp://www.pnas.org/cgi/doi/10.1073/pnas.1308708110http://www.pnas.org/cgi/doi/10.1073/pnas.1308708110http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1308708110/-/DCSupplemental/sapp.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1308708110/-/DCSupplemental/sapp.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1308708110/-/DCSupplemental/sapp.pdf
  • 7/26/2019 PNAS-2013-Aflalo-18052-7

    6/6

    Fig. 5 presents two cases ofat embedding of ones vs. zeros,and fours vs. threes, from the MNIST dataset. Here, the pairwisedistance is the difference between the distance functions fromthe white regions in each image. While keeping the metric sim-ple, the separation between the classes of digits is obvious. Theelongated classes are the result of slanted digits that appear atdifferent orientations. This observation could help in furtherstudy toward automatic character recognition.

    Finally, we computed the regular MDS and the spectral MDSon different numbers of samples of randomly chosen images,referred to as points. Two hundred eigenvectors of the LBO

    were evaluated on an Intel i7 computer with 16 GB memory. Thefollowing table shows the time it took to compute MDS andspectral MDS embeddings.

    Conclusions

    By working in the natural spectral domain of the data, we showedhow to reduce the complexity of dimensionality reduction pro-cedures. The proposed method exploits the local geometry of thedata manifold to construct a low-dimensional spectral Euclideanspace in which geodesic distance functions are compactly repre-sented. Then, reduction of the data into a low-dimensional Eu-clidean space can be executed with low computational and spacecomplexities. Working in the spectral domain allowed us to ac-count for large distances while reducing the size of the data weprocess in an intrinsically consistent manner. The leading func-tions of the LBO eigenbasis capture the semidifferential structureof the data and thus provide an optimal reconstruction domainfor the geodesic distances on the smooth data manifold. Theproposed framework allowed us to substantially reduce the spaceand time involved in manifold learning techniques, which areimportant in the analysis of big data.

    ACKNOWLEDGMENTS.This research was supported by Ofce of Naval ResearchAward N00014-12-1-0517 and Israel Science Foundation Grant 1031/12.

    1. Schwartz EL, Shaw A, Wolfson E (1989) A numerical solution to the generalized

    mapmakers problem: Flattening nonconvex polyhedral surfaces. IEEE Trans PAMI

    11(9):10051008.

    2. Drury HA, et al. (1996) Computerized mappings of the cerebral cortex: A multi-

    resolution attening method and a surface-based coordinate system. J Cogn Neurosci

    8(1):128.

    3. Dale AM, Fischl B, Sereno MI (1999) Cortical surface-based analysis. I. Segmentation

    and surface reconstruction.Neuroimage 9(2):179194.

    4. Zigelman G, Kimmel R, Kiryati N (2002) Texture mapping using surfaceattening via

    multidimensional scaling. IEEE Trans Vis Comp Graph8(2):198207.

    5. Grossmann R, Kiryati N, Kimmel R (2002) Computational surface attening: A voxel-

    based approach.IEEE Trans PAMI24(4):433

    441.6. Elad A, Kimmel R (2003) On bending invariant signatures for surfaces.IEEE Trans PAMI

    25(10):12851295.

    7. Schweitzer H (2001) Template matching approach to content based image indexing

    by low dimensional Euclidean embedding. Computer Vision 2001. ICCV 2001: Pro-

    ceedings of the Eighth International Conference on Computer Vision (IEEE, New

    York), pp 566571.

    8. Rubner Y, Tomasi C (2001) Perceptual metrics for image database navigation. PhD

    dissertation (Stanford Univ, Stanford, CA).

    9. Aharon M, Kimmel R (2006) Representation analysis and synthesis of lip images using

    dimensionality reduction. Int J Comput Vis 67(3):297312.

    10. Pless R (2003) Image spaces and video trajectories: Using Isomap to explore video

    sequences.Computer Vision 2003: Proceedings of the Ninth IEEE International Con-

    ference on Computer Vision (IEEE, New York), pp 14331440.

    11. Bronstein AM, Bronstein MM, Kimmel R (2005) Three-dimensional face recognition.

    Int J Comput Vis 64(1):530.

    12. Borg I, Groenen P (1997) Modern multidimensional scaling: Theory and applications.

    J Educ Meas40(3):277280.

    13. Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for

    nonlinear dimensionality reduction. Science 290(5500):23192323.

    14. Kimmel R, Sethian JA (1998) Computing geodesic paths on manifolds.Proc Natl Acad

    Sci USA95(15):84318435.

    15. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear em-

    bedding.Science 290(5500):23232326.

    16. Donoho DL, Grimes C (2003) Hessian eigenmaps: Locally linear embedding techniques

    for high-dimensional data. Proc Natl Acad Sci USA 100(10):55915596.

    17. Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embed-

    ding and clustering. Advances in Neural Information Processing Systems (MIT Press,

    Cambridge, MA), pp 585591.

    18. Coifman RR, Lafon S (2006) Diffusion maps.Appl Comput Harmon Anal21(1):530.

    19. CoifmanRR,et al.(2005)Geometricdiffusionsas a tool forharmonic analysis andstructure

    denition of data: Diffusion maps.Proc Natl Acad Sci USA102(21):74267431.

    20. de Silva V, Tenenbaum JB (2003) Global versus local methods in nonlinear di-

    mensionality reduction. Advances in Neural Information Processing Systems (MIT

    Press, Cambridge, MA), pp 705712.

    21. Bengio Y, Paiement JF, Vincent P (2003) Out-of-sample extensions for LLE, Isomap,

    MDS, eigenmaps, and spectral clustering. Advances in Neural Information Processing

    Systems (MIT Press, Cambridge, MA), pp 177184.

    22. Asano T, et al. (2009) A linear-space algorithm for distance preserving graph em-

    bedding.Comput Geom Theory Appl42(4):289304.

    23. Bronstein MM, Bronstein AM, Kimmel R, Yavneh I (2006) Multigrid multidimensional

    scaling.Numer Linear Algebra App 13(2-3):149171.24. Rosman G, Bronstein MM, Bronstein AM, Sidi A, Kimmel R (2008) Fast multidimensional

    scaling using vector extrapolation. Technical report CIS-2008-01 (Technion, Haifa, Israel).

    25. Rosman G, Bronstein MM, Bronstein AM, Kimmel R (2010) Nonlinear dimensionality re-

    duction by topologically constrained isometric embedding. Int J Comput Vis 89(1):5668.

    26. Rustamov RM (2011) Interpolated eigenfunctions for volumetric shape processing.Vis

    Comput27(11):951961.

    27. Raviv D, Bronstein MM, Bronstein AM, Kimmel R (2010) Volumetric heat kernel sig-

    natures.Proc ACM Workshop 3D Object Retrieval(ACM, New York), pp 3944.

    28. Williams CKI (2001) On a connection between kernel PCA and metric multidimen-

    sional scaling. Advances in Neural Information Processing Systems (MIT Press, Cam-

    bridge, MA), pp 675681.

    29. Brard P, Besson G, Gallot S (1994) Embedding Riemannian manifolds by their heat

    kernel.Geom Funct Anal4(4):373398.

    30. Sun J, Ovsjanikov M, Guibas L (2009) A concise and provably informative multi-scale

    signature based on heat diffusion. Comp Graph Forum28(5):13831392.

    31. Ovsjanikov M, Ben-Chen M, Solomon J, Butscher A, Guibas L (2012) Functional maps:

    A exible representation of maps between shapes. ACM Trans Graph 3:30:130:11.

    32. Weyl H (1950) Ramications, old and new, of the eigenvalue problem. Bull Am MathSoc56(2):115139.

    33. Pinkall U, Polthier K (1993) Computing discrete minimal surfaces and their conjugates.

    Exper Math 2(1):1536.

    34. Aalo Y, Bronstein AM, Bronstein MM, Kimmel R (2012) Deformable shape retrieval

    by learning diffusion kernels. Proc Scale Space Variational Meth Comp Vis (Springer,

    Berlin), Vol 6667, pp 689700.

    35. Bronstein AM, Bronstein MM, Kimmel R (2008) Numerical Geometry of Non-Rigid

    Shapes (Springer, New York).

    36. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to

    document recognition. Proc IEEE86(11):22782324.

    No. of points MDS run time, s Spectral MDS run time, s

    1,000 3 0.1

    2,000 5.3 0.4

    4,000 46 2.3

    10,000 366 3.2

    A B

    Fig. 5. Flat embedding of zeros (blue) vs. ones (red) (A), and fours (red) vs.

    threes (blue) (B).

    Aalo and Kimmel PNAS | November 5, 2013 | vol. 110 | no. 45 | 18057

    APPLIED

    SEECOMMENTARY