30
Mutiscale Mapper: A Framework for Topological Summarization of Data and Maps Tamal K. Dey * , Facundo M´ emoli , Yusu Wang * March 31, 2019 Abstract Summarizing topological information from datasets and maps defined on them is a central theme in topological data analysis. Mapper, a tool for such summarization, takes as input both a possibly high dimensional dataset and a map defined on the data, and produces a summary of the data by using a cover of the codomain of the map. This cover, via a pullback operation to the domain, produces a simplicial complex connecting the data points. The resulting view of the data through a cover of the codomain offers flexibility in analyzing the data. However, it offers only a view at a fixed scale at which the cover is constructed. Inspired by the concept, we explore a notion of hierarchical family of coverings which induces a hierarchical family of simplicial complexes connected by simplicial maps, which we call multiscale mapper. We study the resulting structure, its associated persistence module, and its stability under perturbations of the maps and the coverings. The information encoded in multiscale mapper complements that of individual mappers at fixed scales. An upshot of this development is a practical algorithm for computing the persistence diagram of multiscale mapper when the domain is a simplicial complex and the map is a real-valued piecewise-linear function. Contents 1 Introduction 2 2 Topological background and motivation 4 2.1 Maps between coverings .................................. 5 2.2 Pullbacks .......................................... 6 3 Multiscale Mapper 6 3.1 Interleaving of hierarchical families of coverings ..................... 8 3.2 Interleaving of hierarchical families of simplicial complexes ............... 8 4 Understanding Stability of Mapper and Multiscale Mapper 9 4.1 The instability of Mapper ................................. 9 4.2 Good hiearchical family of coverings ........................... 10 4.3 The necessity of (c, s)-good hierarchical family of coverings .............. 11 4.4 Stability against function perturbation .......................... 13 4.5 Stability in general ..................................... 15 * Department of Computer Science and Engineering, The Ohio State University. tamaldey, [email protected] Department of Mathematics and Department of Computer Science and Engineering, The Ohio State University. [email protected] 1 arXiv:1504.03763v1 [cs.CG] 15 Apr 2015

Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

Mutiscale Mapper: A Framework for Topological Summarization of

Data and Maps

Tamal K. Dey∗, Facundo Memoli†, Yusu Wang∗

March 31, 2019

Abstract

Summarizing topological information from datasets and maps defined on them is a centraltheme in topological data analysis. Mapper, a tool for such summarization, takes as input botha possibly high dimensional dataset and a map defined on the data, and produces a summaryof the data by using a cover of the codomain of the map. This cover, via a pullback operationto the domain, produces a simplicial complex connecting the data points.

The resulting view of the data through a cover of the codomain offers flexibility in analyzingthe data. However, it offers only a view at a fixed scale at which the cover is constructed.Inspired by the concept, we explore a notion of hierarchical family of coverings which induces ahierarchical family of simplicial complexes connected by simplicial maps, which we call multiscalemapper. We study the resulting structure, its associated persistence module, and its stabilityunder perturbations of the maps and the coverings. The information encoded in multiscalemapper complements that of individual mappers at fixed scales. An upshot of this developmentis a practical algorithm for computing the persistence diagram of multiscale mapper when thedomain is a simplicial complex and the map is a real-valued piecewise-linear function.

Contents

1 Introduction 2

2 Topological background and motivation 42.1 Maps between coverings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Pullbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Multiscale Mapper 63.1 Interleaving of hierarchical families of coverings . . . . . . . . . . . . . . . . . . . . . 83.2 Interleaving of hierarchical families of simplicial complexes . . . . . . . . . . . . . . . 8

4 Understanding Stability of Mapper and Multiscale Mapper 94.1 The instability of Mapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.2 Good hiearchical family of coverings . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.3 The necessity of (c, s)-good hierarchical family of coverings . . . . . . . . . . . . . . 114.4 Stability against function perturbation . . . . . . . . . . . . . . . . . . . . . . . . . . 134.5 Stability in general . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

∗Department of Computer Science and Engineering, The Ohio State University. tamaldey,

[email protected]†Department of Mathematics and Department of Computer Science and Engineering, The Ohio State University.

[email protected]

1

arX

iv:1

504.

0376

3v1

[cs

.CG

] 1

5 A

pr 2

015

Page 2: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

5 Computing and approximating multiscale mapper 165.1 Piecewise-linear functions on simplicial domains . . . . . . . . . . . . . . . . . . . . . 165.2 Approximating multiscale mapper for general maps . . . . . . . . . . . . . . . . . . . 18

6 The metric point of view 226.1 The pull-back pseudo-metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226.2 Stability of dU,f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236.3 Cech filtrations using dU,f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246.4 An interleaving between MM(U, f) and Cech(U, f). . . . . . . . . . . . . . . . . . . . 24

7 Discussion 25

A Constructing a good family of covers 28A.1 A simple (c, s)-good hierarchical family of coverings. . . . . . . . . . . . . . . . . . . 28A.2 A space-efficient (c, s)-good hierarchical family of coverings. . . . . . . . . . . . . . . 28

1 Introduction

Recent years have witnessed significant progress in applying topological ideas to analyzing com-plex and diverse data. Topological ideas can be particularly powerful in deriving a succinct andmeaningful summary of input data. For example, the theory of persistent homology built upon[15, 16, 21] and other fundamental developments [1, 2, 3, 6, 7, 8, 23], has provided a powerful andflexible framework for summarizing information of an input space or a scalar field into a muchsimpler object called the persistence diagram/barcode.

Modern data are often complex both in terms of the domain where they come from and in termsof properties/observations associated with them which are often modeled as maps. For example,we can have a set of patients, where each patient is associated with multiple biological markers,giving rise to a multivariate map from the space of patients to an image domain that may or maynot be the Euclidean space. To this end, we need to develop theoretically justified methods toanalyze not only real-valued scalar fields, but also more complex maps defined on a given domain,such as multivariate, circle valued, sphere valued maps, etc.

There has been interesting pioneering work in this direction, including multidimensional persis-tence [2] and persistent homology for circular valued maps [1]. However, summarizing multivariatemaps using these techniques appears to be very challenging. Our approach takes a different direc-tion, and is inspired by and based on the mapper methodology, recently introduced by Singh et al.in [22]. Taking an observation made in [22] as a starting point regarding the behavior of Mapperunder a change in the covers, we study a multiscale version of mapper, which we will henceforthrefer to as multiscale mapper, that is capable of producing a multiscale summary using a coveringof the codomain at different scales.

Given a (multivariate) map f : X → Z, Singh et al. proposed a novel perspective to createa topological metaphor for the structure behind f by pulling back a covering of the space Z to acovering on X through f . The mapper methodology is general: it can work with any (reasonablytame) continuous maps between two topological spaces, and it converts complex maps and coveringsof the target space, into simplicial complexes, which are much easier to manipulate computationally.It is also powerful and flexible– one can view the map f and a finite covering of the space Z as thelens through which the input data X is examined. By choosing different maps and coverings, theresulting mapper representation captures different aspects of the input data. Indeed, the mapper

2

Page 3: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

methodology has been successfully applied to analyzing various types of data, see e.g, [20, 18], andit is a main component behind the data analytics platform developed by the company Ayasdi.

Contributions. Given an input map f : X → Z and a finite covering U of Z, the induced mapperrepresentation M(U , f) is a simplicial complex encoding the structure of f through the lens of Z.However, the simplicial complex M(U , f) in some sense provides only one snapshot of X in a fixedscale as determined by the scale of the covering U . Using the idea of persistence homology, westudy the evolution of the Mapper output M(f,Uε) for a hierarchical family of coverings U = Uεεat multiple scales (as indexed by ε).

As an intuitive example, consider a real-valued function f : X → R, and a covering Uε ofR consisting of all possible intervals of length ε. As ε approaches 0, the corresponding MapperM(f,Uε) intuitively approaches to the so-called Reeb graph Rf of f . As ε increases, we intuitivelylook at the Reeb graph at coarser and coarser resolution. The multiscale mapper in this caseroughly encodes this simplification process.

The resulting multiscale mapper MM(U, f), which we formally define in §3, consists of a sequenceof simplicial complexes connected with simplicial maps. Upon passing to homology with fieldscoefficients, this gives rise to a persistence module and the information in MM(U, f) can be furthersummarized in the corresponding persistence diagram. In other words, we can now summarize aninput described by a multivariate (or circle/sphere valued) map into a single persistence diagram,much like in traditional persistence homology for real-valued functions. Below we delineate someof the main theoretical studies of the multiscale mapper construction that we present in this paper.

In §3.1 we introduce and discuss concepts that permit studying the stability of multiscalemapper under change in the hierarhical families of coverings. In §4, we show that the multiscalemapper output MM(U, f) is stable with respect to perturbations both in the map f and in thehierarchical family of coverings U. Stability is a highly desirable property for a summary as itimplies robustness to noise in data and in measurements. Interestingly, analogous to the caseof homology versus persistence homology, mapper does not satisfy a stability property, whereasmultiscale mapper does enjoy stability as we prove in this paper. We express our results aboutstability by developing analogues to the ideas of interleaving of persistence modules [3] at the levelsof hierarchical families of coverings and hierarchical families of simplicial complexes.

In §5, we consider the computational aspect of multiscale mapper for one of the most commontypes of input where the domain X is a simplicial complex, and the map is a piecewise-linearfunction. We show that for such input, we only need to consider the restriction of the functionto the 1-skeleton of X to compute both the mapper and the multiscale mapper outputs. Thiscan help to significantly improve the time efficiency of any algorithm to construct the mapper andmultiscale mapper outputs. We also consider the more general case of a map f : X → Z where Xis a simplicial complex but Z is not necessarily real-valued. We show that there is an even simplercombinatorial version of the multiscale mapper whose resulting persistence diagram approximatesthat of the standard multiscale mapper.

Finally, in §6, we show that given a hierarchical family of coverings U and a map f : X → Zthere exists a natural pull-back pseudo-metric dU,f defined on the input domain X. With sucha pseudo-metric on X, we can now construct the standard Cech filtration C = Cechε(X)ε (orRips filtration) in X directly, instead of computing the Nerve complex of the pull-back coverings asrequired by mapper. The resulting filtration C is connected by inclusion maps instead of simplicialmaps. This is easier for computational purposes even though one has a method to compute thepersistence diagram of a filtration involving arbitrary simplicial maps [11]. Furthermore, it turnsout that the resulting sequence of Cech complexes C interleaves with the sequence of complexes

3

Page 4: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

MM(U, f), implying that their corresponding persistence diagrams approximate each other.

2 Topological background and motivation

In this section we recall several important facts about topological spaces and simplicial complexeswhich will be crucial for our constructions. A standard reference is [19].

Let K and L be two finite simplicial complexes over the vertex sets VK and VL, respectively. Aset map φ : VK → VL is a simplicial map if φ(σ) ∈ L for all σ ∈ K. Given a simplicial complex byid we will denote the identity simplicial map, that is, the simplicial map arising from the identitymap on vertices.

Definition 2.1 (Contiguous simplicial maps). Let K and L be two finite simplicial complexes andh1, h2 : K → L be simplicial maps. If for all σ ∈ K it holds that h1(σ) ∪ h2(σ) ∈ L then we saythat h1 and h2 are contiguous.

Recall that contiguous simplicial maps induce the same map at homology level, cf. [19].By an open cover of a topological space X we mean a collection U = Uαα∈A of open sets such

that⋃α∈A Uα = X. In this paper, whenever referring to an open cover, we will always assume that

each Uα is path connected.

Definition 2.2 (Nerve of a covering). Given a finite covering U = Uαα∈A of a topological spaceX, we define the nerve of the covering U to be the simplicial complex N(U) whose vertex set isthe index set A, and where a subset α0, α1, . . . , αk ⊆ A spans a k-simplex in N(U) if and only ifUα0 ∩ Uα1 ∩ . . . ∩ Uαk 6= ∅.

Suppose that we are given a topological space X equipped with a continuous map f : X → Z intoa parameter space Z, where Z is equipped with an open covering U = Uαα∈A for some finiteindexing set A. Since f is continuous, the sets f−1(Uα) also form an open covering of X. For eachα, we can now consider the decomposition of f−1(Uα) into its path connected components, so wewrite f−1(Uα) =

⋃jαi=1 Vα,i, where jα is the number of path connected components in f−1(Uα). We

write f∗(U) for the covering of X obtained this way from the covering U of Z and refer to it as thepullback covering of X induced by U via f .

Notice that there are pathological examples of f where f−1(Uα) may shatter into infinitelymany path components. This motivates introducing the following definition:

Definition 2.3. A continuous function f : X → Z is called well-behaved if for every z ∈ Z, thepreimage f−1(z) has finitely many path connected components.

Examples of well-behaved functions are Morse functions f : X → R and piecewise-linear realvalued functions defined on a triangulated manifold. It follows that for any well-behaved functionf : X → Z and any finite open cover U of Z, the open cover f∗(U) will also be finite.

For future reference, cc(O) for a set O denotes the set of all path connected components of O.

Definition 2.4 (Mapper [22]). Let X and Z be topological spaces and let f : X → Z be a continuousmap. Let U = Uαα∈A be a finite open covering of Z. The mapper construction arising from thesedata is defined to be the nerve simplicial complex of the pullback cover:

M(U , f) := N(f∗(U)).

Remark 2.1. This construction is quite general. It encompasses both the Reeb graph and mergetrees at once: consider X a topological space and f : X → R. Then, consider the following twooptions for U = Uαα∈A, the other ingredient of the construction:

4

Page 5: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

• Uα = (−∞, α) for α ∈ A = R. This corresponds to sublevel sets which in turn lead to mergetrees.

• Uα = (α− ε, α+ ε) for α ∈ A = R, for some fixed ε > 0. This corresponds to (ε-thick) levelsets, which induce a relaxed notion of Reeb graphs.

In these two examples, for simplicity of presentation, the set A was allowed to have inifinite car-dinaltiy. Also, note one can take any open cover of X in this definition. This may give rise toother constructions beyond merge trees or Reeb graphs. For instance, one may choose any pointr ∈ R and let Uα = (r − α, r + α) for each α ∈ A = R or other constructions.

2.1 Maps between coverings

If we have two coverings U = Uαα∈A and V = Vββ∈B of a space X, a map of coverings from Uto V is a set map ξ : A → B so that Uα ⊆ Vξ(α) for all α ∈ A. Given one covering U = Uαα∈A,by id we will denote the identity map which sends each Uα to Uα, α ∈ A.

Remark 2.2 (The nerve and induced maps). We have the following three useful properties of thenerve construction:

1. If we are given a map of coverings from U = Uαα∈A to V = Vββ∈B, i.e. a map ofsets ξ : A → B satisfying the conditions above, then there is an induced simplicial mapN(ξ) : N(U)→ N(V), given on vertices by the map ξ.

Indeed, let A ⊇ σ ∈ N(U). This means that⋂α∈σ Uα 6= ∅. Then

⋂β∈ξ(σ) Vβ =

⋂α∈σ Vξ(α) ⊇⋂

α∈σ Uα 6= ∅, which in turn means that ξ(σ) ∈ N(V).

2. For id : U → U for a covering U , N(id) is exactly the identity map from N(U) to itself.

3. Furthermore, if U ξ→ V ζ→ W are three different coverings of a topological space with theintervening maps of covers between them, then N(ζ ξ) = N(ζ) N(ξ) as well.

The following simple lemma will be very useful later on.

Lemma 2.5 (Maps of covers induce contiguous simplicial maps). Let ζ, ξ : U → V be any twomaps of covers. Then, the simplicial maps N(ζ) and N(ξ) are contiguous.

Remark 2.3. Note that this implies that at homology level, the induced maps will coincide. Thus,the induced map can be deemed canonical.

Proof of lemma 2.5. Write U = Uαα∈A and V = Vββ∈B. Then, for all α ∈ A we have both

Uα ⊆ Vζ(α) and Uα ⊆ Vξ(α).

This means that Uα ⊆ Vζ(α) ∩ Vξ(α) for all α ∈ A. Now take any σ ∈ N(U). We need to prove thatζ(σ) ∪ ξ(σ) ∈ N(V). For this write

⋂β∈ζ(σ)∪ξ(σ)

Vβ =

(⋂α∈σ

Vζ(α)

)∩

(⋂α∈σ

Vξ(α)

)=⋂α∈σ

(Vζ(α) ∩ Vξ(α)

)⊇⋂α∈σ

Uα 6= ∅,

where the last step follows from assuming that σ ∈ N(U).

5

Page 6: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

2.2 Pullbacks

The pullback operation described in §2 exhibits useful properties:

1. When we consider a space X equipped with a continuous map f : X → Z to a topologicalspace Z, and we are given a map of coverings ξ : U → V between coverings of Z, there is acorresponding map of coverings between the respective pullback coverings of X:

f∗(ξ) : f∗(U) −→ f∗(V).

Indeed, we only need to note that if U ⊆ V , then f−1(U) ⊆ f−1(V ), and therefore it is clearthat each path connected component of f−1(U) is included in exactly one path connectedcomponent of f−1(V ). Let’s describe the pullback operation on covers in a precise way.Write U = Uαα∈A, V = Vββ∈B, with Uα ⊆ Vξ(α) for α ∈ A. Let Uα,i, i ∈ 1, . . . , nαdenote the connected components of f−1(Uα) and Vβ,j , j ∈ 1, . . . ,mβ denote the connectedcomponents of Vβ.

Then, the map of coverings f∗(ξ) from f∗(U) to f∗(V) is given by requiring that each set Uα,iis sent to the unique set of the form Vξ(α),j so that Uα,i ⊆ Vξ(α),j .Precisely, for each α ∈ A and i ∈ 1, 2, . . . , nα let γα(i) be the unique element in 1, . . . ,mξ(α)such that Uα,i ⊆ Vξ(α),γα(i). Then, f∗(α, i) := (ξ(α), γα(i)) for each α ∈ A and i ∈ 1, . . . , nα.

2. For id : U → U the previous item implies that f∗(id) is exactly the map id : f∗(U)→ f∗(U).

3. Furthermore, if U ξ→ V ζ→ W are three different coverings of a topological space with theintervening maps of covers between them, then f∗(ζ ξ) = f∗(ζ) f∗(ξ).

3 Multiscale Mapper

Definition 3.1 (Hierarchical family of coverings). A hierarchical family of coverings (HFoCs forshort) U of X with resolution r ∈ R is any collection U =

Uεε≥r of coverings Uε of X together with

maps of coverings uε,ε′ : Uε → Uε′ so that uε,ε = id and uε′,ε′′ uε,ε′ = uε,ε′′ for all r ≤ ε ≤ ε′ ≤ ε′′.Sometimes we write U =

uε,ε′−→ Uε′r≤ε≤ε′ to denote the collection with the maps. Given such a

family U, res(U) will be referred to as its resolution. We say that U is a positive HFoCs wheneverres(U) is strictly positive.

The notion of resolution in the definition of HFoC may seem extraneous. We will see in §4.2 howthis concept is associated with the spatial granularity of the covers. Given a HFoC U, by applyingthe nerve construction to it we thus obtain a diagram of simplicial complexes and simplicial maps.This motivates the following definition:

Definition 3.2 (Hierarchical family of simplicial complexes). A hierarchical family of simplicialcomplexes1 S with resolution r ∈ R is any family S =

Sεε≥r of simplicial complexes Sε together

with simplicial maps sε,ε′ : Sε → Sε′ for ε ≤ ε′ such that sε,ε = id and sε′,ε′′ sε,ε′ = sε,ε′′ for allr ≤ ε ≤ ε′ ≤ ε′′. Given such a family S, we write res(S) to denote its resolution. Sometimes we

write S =Sε

sε,ε′−→ Sε′r≤ε≤ε′ to denote the collection together with the maps.

1Or HFoSCs for short.

6

Page 7: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

Given a HFoCs U of a topological space X, the nerve of each cover in U together with each mapof U provides a hierarchical family of simplicial complexes which we denote by N(U).

The pullback properties described in §2.2 make it possible to take the pullback of a given HFoCsof a space via a given continuous function into another space, so that we obtain:

Proposition 3.3. Let U be a hierarchical family of coverings of Z and f : X → Z be a continuousfunction. Then, f∗(U) is a hierarchical family of coverings of X.

Definition 3.4 (Multiscale Mapper). Let X and Z be topological spaces and f : X → Z bea continuous map. Let U be a HFoCs of Z. Then, the multiscale mapper is defined to be thehierarchical family of the nerve simplicial complexes of the pullback:

MM(U, f) := N(f∗(U)).

Consider for example a sequence res(U) ≤ ε1 < ε2 < . . . < εn of n distinct real numbers. Then,the definition of multiscale mapper MM(U, f) gives rise to the following sub-object

N(f∗(Uε1))→ N(f∗(Uε2))→ · · · → N(f∗(Uεn)) (3.1)

which is a sequence of simplicial complexes connected by simplicial maps, i.e. a filtration in thesense of [11]. The interest in these is that upon applying to them the homology functor Hk(·)(k = 0, 1, 2, . . .) (with coefficients in a field) one obtains persistent modules [13], that is, directedsequences of vector spaces connected by linear maps:

Hk

(N(f∗(Uε1))

)→ Hk

(N(f∗(Uε2))

)→ · · · → Hk

(N(f∗(Uεn))

)(3.2)

Remark 3.1. Note that by Lemma 2.5, the persistent module (3.2) is invariant to the choiceof maps of coverings associated with a HFoCs U. Specifically, suppose we are given two HFoCs

U =Uε

vε,ε′−→ Uε′r≤ε≤ε′ and U′ =

wε,ε′−→ Uε′r≤ε≤ε′ of Z differing only on the connecting maps

vε,ε′ and wε,ε′. Then the pullbacks of vε,ε′ and wε,ε′ induce two maps of coverings f∗(vε,ε′), f∗(wε,ε′) :

f∗(Uε)→ f∗(Uε′), which by Lemma 2.5, in turn induce contiguous maps N(f∗(Uε))→ N(f∗(Uε′))at the level of nerve complexes. It follows that the induced persistent modules (3.2) are identical athomology level.

The information contained in a persistent module can be summarized by its associated persis-tent homology diagrams. However, as pointed out in [3], a finiteness condition is required whichprompted Definition 2.3 and also motivates the definition below.

Definition 3.5. A hierarchical family of coverings U =Uε

is pointwise finite, or p.f. in short,if Uε is finite for each ε ≥ res(U). A p.f. hierarchical family of simplicial complexes is definedanalogously. Observe that N(U) is p.f. if and only if U is p.f.

If the HFoC U referred in (3.2) is p.f. and the function f is well-behaved, then each simplicialcomplex N(f∗(Uεi)) is finite which also implies that their homology groups have finite dimensions.Now one can summarize the persistent module with a finite persistent diagram. The same pipelineapplied to the persistent module above can be implemented for the full sequence MM(U, f) and, foreach k ∈ N, one obtains a persistence diagram DkMM(U, f) (see [13] for background on persistencediagrams). A fundamental result [7] in the theory of persistence is that persistence diagramsare stable in the bottleneck distance meaning that small perturbations to a filtration yield smallvariations in the computed persistence diagrams. An important contribution in this regard is thepaper [3] where stability is expressed directly at the (algebraic) level of persistent modules (3.2)via a quantitative structural condition called interleaving of pairs of persistence modules.

In the forthcoming sections we identify compatible notions of interleaving both at the levelof HFoCs and at the level of hierarchical families of simplicial complexes. These notions will beinstrumental in expressing the stability of multiscale mapper in §4.

7

Page 8: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

3.1 Interleaving of hierarchical families of coverings

In this section we consider HFoCs indexed over R. In practice, however, we will have familiesindexed by a discrete set in R. Any such family can be extrapoloated to a HFoCs indexed over Rby appropriately repeating the coverings at the discrete points over the gaps.

We now adapt the notion of interleaving, originally proposed in the context of persistent modules[3], to the setting of HFoCs and HFoSCs.

Definition 3.6 (Interleaving of hierarchical families of coverings). Let U = Uε and V = Vε betwo hierarchical families of coverings of a topological space X such that res(U) = res(V) = r. Givenη ≥ 0, we say that U and V are η-interleaved if one can find maps of coverings ζε : Uε → Vε+η andξε : Vε → Uε+η for all ε ≥ r. This is represented by the diagram:

· · · −→ Uεζε

// Uε+η //

ζε+η

,,

Uε+2η −→ · · ·

· · · −→ Vε //

ξε

>>

Vε+η

ξε+η

22

// Vε+2η −→ · · ·

(3.3)

Remark 3.2. It is easy to see that if U and V are η1-interleaved and V and W are η2-interleaved,then, U and W are (η1 + η2)-interleaved.

Proposition 3.7. Let f : X → Z be a continuous function and U and V be two η-interleavedhierarchical families of coverings of Z. Then, f∗(U) and f∗(V) are also η-interleaved.

Proof. Recall the pullback properties and Corollary 3.3. Applying the pullback operation f∗ todiagram (3.3) yields the claim immediately.

3.2 Interleaving of hierarchical families of simplicial complexes

We now consider interleaving of hierarchical families of simplicial complexes.

Definition 3.8 (Interleaving of hierarchical families of simplicial complexes). Let S =Sε

sε,ε′−→

Sε′r≤ε≤ε′ and T =

tε,ε′−→ Tε′r≤ε≤ε′ be two hierarchical families of finite simplicial complexes

where res(S) = res(T) = r. We say that they are η ≥ 0 interleaved if for each ε ≥ r one can findsimplicial maps ϕε : Sε → Tε+η and ψε : Tε → Sε+η so that for all ε ≥ r

(i) ϕε+η ψε and tε,ε+2η are contiguous, and

(ii) ψε+η ϕε and sε,ε+2η are also contiguous.

This is represented by the diagram below:

· · · −→ Sεϕε

// Sε+η //

ϕε+η

,,

Sε+2η −→ · · ·

· · · −→ Tε //

ψε

>>

Tε+η

ψε+η

22

// Tε+2η −→ · · ·

(3.4)

Proposition 3.9. Let U and V be two η-interleaved hierarchical families of coverings of X withres(U) = res(V). Then, N(U) and N(V) are also η-interleaved.

8

Page 9: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

Proof. Let r denote the common resolution of U and V. Write U =Uε

uε,ε′−→ Uε′r≤ε≤ε′ and

V =Vε

vε,ε′−→ Vε′r≤ε≤ε′ , and for each ε ≥ r let ζε : Uε → Vε+η and ξε : Vε → Uε+η be given such as

in diagram (3.3). Recall the properties of the nerve construction and consider the nerve of diagram(3.3). This operation yields a diagram identical to (3.4) where for each ε ≥ r:

• Sε := N(Uε), Tε := N(Vε),

• sε,ε′ := N(uε,ε′), for r ≤ ε ≤ ε′; tε,ε′ := N(vε,ε′), for r ≤ ε ≤ ε′; ϕε := N(ζε), and ψε := N(ξε).

To satisfy Definition 3.8, it remains to verify conditions (i) and (ii). We only verify (i), since theproof of (ii) follows the same arguments. For this, notice that both the composite map ζε+η ξεand vε,ε+2η are maps of covers from Vε to Vε+2η. By Lemma 2.5 we then have that N(ζε+η ξε) andN(vε,ε+2η) = tε,ε+2η are contiguous. But, by the properties of the nerve construction N(ζε+η ξε) =N(ζε+η) N(ξε) = ϕε+η ψε, which completes the claim.

From now onwards, for a finite hierarchical family of simplicial complexes S and each k ∈ N,we will denote by DkS the k-th persistence diagram of S with coefficients in the field F. Noticethat applying the simplicial homology functor with coefficients in a (henceforth fixed) field F to adiagram such as (3.4) yields two persistent modules weakly interleaved in the sense of [3]. Thus,we have a stability result for DkMM(U, f) when f is kept fixed but the HFoCs U is perturbed.

Corollary 3.10. For η ≥ 0, let U and V be two positive p.f. hierarchical families of coveringsof Z with res(U) = res(V) . Let f : X → Z be well-behaved and U and V are η-interleaved.Then, MM(U, f) and MM(V, f) are η-interleaved. In particular, the bottleneck distance betweenthe persistence diagrams DkMM(U, f) and DkMM(V, f) is at most 3η for all k ∈ N.

4 Understanding Stability of Mapper and Multiscale Mapper

In this section we will discuss the stability properties of both the standard Mapper and and theMultiscale Mapper constructions.

We utilize the concepts of interleaving on both HFoCs and HFoSCs from §3.1 and §3.2 to expressthe stability of the multiscale mapper construction. To guarantee stability against perturbationsboth in functions and HFoCs, it is necessary to restrict the multiscale mapper to a restricted classof hierarchical families of coverings, which we introduce and justify in §4.2 and 4.3. In §4.4 westudy the impact of perturbing the function f on MM(U, f) while keeping U fixed, and in §4.5 wealso consider perturbing the intervening HFoCs.

4.1 The instability of Mapper

In this section we briefly discuss how one may perceive that the simplicial complexes produced byMapper may not admit a simple notion of stability.

As an example consider the situation in Figure 1. Consider for each δ > 0 the domain Xδ shownin the figure (a topological graph with one loop), and the functions depicted in the figure: theseare height functions fδ and gδ which differ by δ, that is ‖fδ−gδ‖∞ = δ. The open cover is shown inthe middle of the figure. Notice that the Mapper outputs for these two functions w.r.t. the sameopen cover of the co-domain R are different and that the situation can be replicated for each δ > 0.

In contrast, one of the features of the Multiscale Mapper construction is that it is amenableto a certain type of stability under changes in the function and in the hierarchical coverings.This situation is not suprising and is reminiscent of the pattern arising when comparing standard

9

Page 10: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

δ

δ

Figure 1: A situation in which Mapper yields two different answers for similar functions on thesame domain. The construction can be carried out for each δ > 0.

homology (and Betti numbers) computed on fixed simplicial complexes to persistent homology (andBetti barcodes/persistent diagrams) on hierarchical families of simplicial complexes (i.e. filtrations).

In what follows we will introduce a particular class of HFoCs that will be used to express somestability properties enjoyed by Multiscale Mapper.

4.2 Good hiearchical family of coverings

In what follows, we are going to assume that the target compact topological space Z comes endowedwith a metric dZ . For a subset O ⊂ Z, by diam(O) we mean its diameter, that is, the numbersupz,z′∈O dZ(z, z′). For δ ≥ 0 by Oδ we mean the set z ∈ Z| dZ(z,O) ≤ δ.

Definition 4.1 (Good hierarchical family of coverings). Let c ≥ 1 and s > 0. We say that

a hierarchical family of covers W = Wε

wε,ε′−→ Wε′ε≤ε′ for the compact metric space (Z, dZ) is

(c, s)-good if:

1. res(W) = s, and s ≤ diam(Z);

2. diam(W ) ≤ ε for all W ∈ Wε and all ε ≥ s; and

3. for all O ⊂ Z with diam(O) ≥ s, there exists W ∈ Wc·diam(O) such that W ⊇ O.

Remark 4.1. We make the following remarks:

• The underlying intuition is that an ideal HFoCs is one for which c = 1 and s = 0.

• A first example is given by the following construction: pick any s ∈ [0,diam(Z)] and for ε ≥ slet Uε := B(z, ε2), z ∈ Z, i.e. the collection of all ε-balls in Z. The maps wε,ε′ for ε′ ≥ ε

are defined in the obvious way: B(z, ε2) 7→ B(z, ε′

2 ), all z ∈ Z. Then, since any O ⊂ Z is

10

Page 11: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

contained in some ball B(o,diam(O)) for some o ∈ O, this means that Uεε≥s is (2, s)-good.Of course whenever Z is path connected and not a singleton, this HFoCs is infinite. A relatedfinite construction is given next.

• A discrete set P ⊂ Z is called an ν-sample of (Z, dZ) if for any point z ∈ Z, dZ(z, P ) ≤ ν.Given a finite ν-sampling P of Z one can always find a (3, 2ν)-good cover of Z consisting, foreach scale ε ≥ 2ν, of all balls B(p, ε2), p ∈ P . Details are given in Appendix A. This familymay not be space efficient. An example of how to construct a space efficient (c, s)-good coverfor a compact metric space (Z, dZ) can be found in Appendix A.

• Conditions (1) and (2) mean that the family begins at resolution s, and the resolution pa-rameter ε controls the geometric characteristics of the elements of the cover. Parameter s isclearly related to the size/finiteness of U: if s is very small, then the number of sets in Uε forε larger than but close to s has to be large. Also, requiring s to be smaller than or equal tothe diameter of Z ensures that condition 3 is not vacuously satisfied.

• Condition (3) controls the degree to which one can inject a given set in Z inside an elementof the cover. The situation when c > 1 is consistent with having finite coverings for all ε. Onthe other hand, c = 1 may require, for each ε, open covers with infinitely many elements.

• If U is (c, s)-good, then it is (c′, s′)-good for all c′ ≥ c and s′ ≥ s.

A priori the conditions defining a (c, s)-good HFoCs do not guarantee sets O in Z with diametersmaller than the resolution s can be covered by some element W of some Wε, for some ε ≥ s. Thefollowing proposition deals with this situation and will be useful later on.

Proposition 4.2. If Z is path connected and U is a (c, s)-good HFoCs of Z, then whenever O ⊂ Zis s.t. diam(O) < s, there exists W ∈ Wc·(diam(O)+2s) s.t. W ⊇ O.

Remark 4.2. Note how for a small set O, in the sense that diam(O) < s, one obtains only thatO ⊂ W for some W in Wc·(diam(O)+2s). This is in contrast with what property 3 above guaranteesfor sets with diameter diam(O) ≥ s: the existence of W ∈ Wc·diam(O) with O ⊂W.

Proof. Indeed, since Z is path connected then one has that diam(Os) ≥ mindiam(Z), s = s itfollows by the definition that ∃W ′ ∈ Wc·diam(Os) containing O. We conclude since diam(Os) ≤diam(O) + 2s, O ⊂ Os, and since there exists W ∈ Wc(diam(O)+2s) containing W ′.

We will henceforth assume that (Z, dZ) is a compact path connected metric space.

4.3 The necessity of (c, s)-good hierarchical family of coverings

In sub-section §4.4, we aim to obtain a stability result for the multiscale mapper MM(U, f) withrespect to perturbations of f . First, notice that we need to bound the resolution of an HFoC frombelow by a parameter s > 0 to ensure its finiteness. We explain now why we need the secondparameter c for the stability result. Specifically, in what follows, we provide an example of ahierarchical family of coverings of R which does not satisfy the condition 3 in (c, s)-goodness. Thiscauses the persistence diagram D1MM(U, f) to be unstable with respect to small perturbations off .

11

Page 12: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

Construction of a pathological HFoCs. Given M > 0 let I := [−M,M ]. For each ε > 0 and

k ∈ Z let I(ε)k := [k · 2blogε2c, (k+ 1) · 2blogε2c]. Pick any s ≥ 0 and consider the following family W of

coverings of I by closed intervals:

W =Wε

wε,ε′−→ Wε′s≤ε≤ε′ ; where Wε :=

I(ε)k | k ∈ Z s.t. I

(ε)k ∩ I 6= ∅

.

The maps of coverings wε,ε′ :Wε →Wε′ are defined below.

Claim 4.3. For any ε′ > ε, and each element I(ε)k ∈ Wε, there exists a unique k′ ∈ Z such that

I(ε)k ⊆ I

(ε′)k′ .

For each k ∈ Z we set wε,ε′(I(ε)k ) = I

(ε′)k′ where k′ ∈ Z is given the claim above.

Proof of the claim. Write q = blogε2c and q′ = blogε′2 c = q + n for some n ∈ N. We are going to

produce k′ ∈ Z such that k′ · 2q′ ≤ k · 2q and (k + 1) · 2q ≤ (k′ + 1) · 2q′ , which will immediately

imply that I(ε)k ⊆ I

(ε′)k′ .

Choose k′ to be the largest interger such that k′ ·2q′ ≤ k ·2q. This means that k′ ∈ Z is maximalamongst integers for which k′ · 2n ≤ k. By maximality, we have that (k′ + 1) · 2n ≥ k + 1. Then,

this means that (k′ + 1) · 2q′ ≥ (k + 1) · 2q, which implies that indeed I(ε)k ⊆ I

(ε′)k′ .

In fact, I(ε′)k′ is the unique interval inWε′ that contains I

(ε)k due to the fact that intervals inWε′

have disjoint interiors.

Remark 4.3. By the argument in the proof of the above claim, it also follows that for any s ≤ ε <ε′ < ε′′ and any I ∈ Wε, wε,ε′′(I) = wε′,ε′′(wε,ε′(I)); that is, wε,ε′′ = wε′,ε′′ wε,ε′. Hence the mapswε,ε′ : Wε → Wε′s≤ε≤ε′ are valid maps of coverings, satisfying the conditions in Definition 3.1.Hence W is a hierarchical family of (closed) coverings.

Remark 4.4. This natural hierarchical family of coverings W is not (c, s)-good for any constantc. Specifically, consider an arbitrary small interval (−r, r) for any r > 0. Clearly, there is noelement in any Wε that contains this interval. It turns out that the persistence diagram arisingfrom multiscale mapper is unstable w.r.t. perturbations of the input function, as we will show by anexample shortly. Note that, in contrast, stability of the persistence diagrams of multiscale mapperoutputs is guaranteed for (c, s)-good hierarchical family of coverings, as stated in Corollary 4.9 andTheorem 4.10.

An example. We show how instability of multiscale mapper may arise for some choices of Wusing the following example which is similar to the one presented in Figure 1. Let δ be any valuelarger than s, the smallest scale of the hierarchical family of coverings W. Suppose we are given twofunctions defined on a graph G: f, g : G→ R, as shown in Figure 2. It is clear that ‖f − g‖∞ = δ.

D1MM(W, f) consists of the point (s, 2δ), indicating that a non-null homologous loop exists atscale s in the pullback nerve complex N(f∗(Ws)), but is killed at scale 2δ in the nerve complexN(f∗(W2δ)) (specifically, the image of this loop under the simplicial map f∗(ws,2δ) becomes null-homologous).

However, for the function g, there is a loop created at the lowest scale s, but the homology classcarried by this loop is never killed under the simplicial maps g∗(ws,ε′) : N(g∗(Ws)) → N(g∗(Wε′)for any ε′ > s. Thus the persistence diagram D1MM(W, g) consists of the point (s,∞). Hencethe two persistence diagrams D1MM(W, f) and D1MM(W, g) are not close under the bottleneckdistance (in fact, the bottleneck distance between them is ∞), despite the fact that the functionsf and g are δ-close , that is ‖f − g‖∞ ≤ δ.

12

Page 13: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

δ

f g

Figure 2: f and g.

Remark 4.5. We remark that for clarity of presentation, in the above construction, each elementin the coveringWε is a closed interval. However, this example can be easily extended to open covers:Specifically, let ν be a sufficiently small positive value such that ν < s < δ. Then we can changeeach interval I = [k · 2blogε2c, (k+ 1) · 2blogε2c] ∈ Wε to Iν = (k · 2blogε2c − ν, (k+ 1) · 2blogε2c + ν). Notethat for an arbitrary small interval (−r, r) with r > ν, there is no element in any Wε that containsit, so that the resulting HFoCs is not (c, s)-good for any c ≥ 1. Hence the example described inFigure 2 can be adapted whenever δ > ν; this leads to the following statement:

Proposition 4.4. Fix s ≥ 0 and W > s. Then, for each W2 > δ > 0 there exist (1) a topological

graph Gδ, (2) two continuous functions fδ, gδ : Gδ → [−W,W ], and (3) a HFoCs W of [−W,W ]satisfying the following properties:

• ‖fδ − gδ‖∞ = δ,

• D1MM(W, fδ) = (s, 2δ) and D1MM(W, gδ) = (s,∞).

4.4 Stability against function perturbation

Our study of stability against function perturbation involves reindexing the involved HFoCs.

Definition 4.5 (Reindexing). Let W =Wε

wε,ε′−→ Wε′r≤ε≤ε′ be a hierarchical family of coverings

of Z with r = res(W) > 0 and φ : R+ → R+ be a monotonically increasing function. Consider the

hierarchical family of coverings Rφ(W) :=Vε

vε,ε′−→ Vε′φ(r)≤ε≤ε′ given by

• Vε :=Wφ−1(ε) for each ε ≥ φ(r) and

• vε,ε′ = wφ−1(ε),φ−1(ε′) for ε, ε′ such that φ(r) ≤ ε ≤ ε′.

We refer to Rφ(W) as a reindexed family.

In our case, we will use the log function to reindex a HFoCs W. We also need the followingdefinition in order to state the stability results.

Definition 4.6. Given a hierarchical family of covers U = Uε and ε0 ≥ res(U), we define theε0-truncation of U as the hierarchical family Trε0(U) :=

Uεε0≤ε. Observe that, by definition

res(Trε0(U)) = ε0.

13

Page 14: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

Proposition 4.7. Let X be a topological space, (Z, dZ) be a compact path connected metric space,and f, g : X → Z be two continuous functions such that for some δ ≥ 0 one has that maxx∈X dZ(f(x), g(x)) =δ. Let W be any (c, s)-good hierarchical family of coverings of Z. Let ε0 = max(1, s). Then, thelog ε0-truncations of Rlog

(f∗(W)

)and Rlog

(g∗(W)

)are log

(2cmax(δ, s) + c

)-interleaved.

Proof. For notational convenience write η := log(2cmax(δ, s) + c

), Ut = U := f∗(W), and

Vt = V := g∗(W). With regards to satisfying Definition 3.8 and diagram (3.3) for Rlog

(U)

andRlog

(V), for each ε ≥ log ε0 we need only exhibit maps of covers ζε : Uexp(ε) → Vexp(ε+η) and

ξε : Vexp(ε) → Uexp(ε+η). We first establish the following.

Claim 4.8. For all O ⊂ Z, and all δ′ ≥ δ, f−1(O) ⊆ g−1(Oδ′).

Proof. Let x ∈ f−1(O), then dZ(f(x), O) = 0. Thus, dZ(g(x), O) ≤ dZ(f(x), O) +dZ(g(x), f(x)) ≤δ, which implies the claim.

Now, pick any t ≥ ε0, any U ∈ Ut, and fix δ′ := max(δ, s). Then, there exists W ∈ Wt suchthat U ∈ cc(f−1(W )). The claim implies that f−1(W ) ⊆ g−1(W δ′). Since W is a (c, s)-good coverof the connected space Z and s ≤ max(δ, s) ≤ diam(W δ′) ≤ 2δ′ + t, there exists at least oneset W ′ ∈ Wc(2δ′+t) such that W δ′ ⊆ W ′. This means that U is contained in some element ofcc(g−1(W ′)) where W ′ ∈ Wc(2δ′+ε). But, also, since c(2δ′ + t) ≤ c(2δ′ + 1)t for t ≥ ε0 ≥ 1, thereexists W ′′ ∈ Wc(2δ′+1)t such that W ′ ⊆ W ′′. This implies that U is contained in some element ofcc(g−1(W ′′)) where W ′′ ∈ Wc(2δ′+1)t. This process, when applied to all U ∈ Ut, all t ≥ ε0, defines a

map of covers ζt : Ut → V(2cδ′+c)t. Now, define for each ε ≥ log(ε0) the map ζε := ζexp(ε) and noticethat by construction this map has Uexp(ε) as domain, and V(2δ′c+c) exp(ε) as codomain. A similarobservation produces for each ε ≥ log(ε0) a map of covers ξε from Vexp(ε) to V(2cδ′+c) exp(ε).

Notice that for each ε ≥ log(ε0) one may write (2cδ′ + c) exp(ε) = exp(ε + η). So we have infact proved that log ε0-truncations of Rlog

(U)

and Rlog

(V)

are η-interleaved.

Corollary 4.9. Let W be any (c, s)-good hierarchical p.f. family of coverings of the compactconnected metric space Z and let f, g : X → Z be any two well-behaved continuous functions suchthat maxx∈X dZ(f(x), g(x)) = δ. Then, the bottleneck distance between the persistence diagramsDkMM(Rlog

(W), f

)and DkMM

(Rlog(W), g

)is at most 3

(log(2cmax(s, δ) + c) + max(0, log 1

s ))

forall k ∈ N.

Proof. We use the notation of Proposition 4.7. Let U = f∗(W) and V = g∗(W). If max(1, s) = s,then Rlog(U) and Rlog(V) are log(2cmax(s, δ) + c)-interleaved by Proposition 4.7 which gives abound on the bottleneck distance of log(2cmax(s, δ) + c) between the corresponding persistencediagrams. In the case when s < 1, the bottleneck distance remains the same only for the 0-truncations of Rlog(U) and Rlog(V). By shifting the starting point of the two families to the leftby at most | log s| can introduce barcodes of lengths at most log 1

s or can stretch the existingbarcodes to the left by at most log 1

s for the respective persistence modules. To see this, considerthe persistence module below where ε1 = log s:

Hk

(N(f∗(Uε1))

)→ Hk

(N(f∗(Uε2))

)→ · · ·Hk

(N(f∗(U0))

)→ · · · → Hk

(N(f∗(Uεn))

)A homology class born at any index in the range [log s, 0) either dies at or before the index 0 or ismapped to a homology class of Hk

(N(f∗(U0))

). In the first case we have a bar code of length at

most | log s| = log 1s . In the second case, a bar code of the persistence module

Hk

(N(f∗(Uε1=0))

)→ Hk

(N(f∗(Uε2))

)→ · · · → Hk

(N(f∗(Uεn))

)14

Page 15: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

starting at index log 1 = 0 gets stretched to the left by at most | log s| = log 1s . The same conclusion

can be drawn for the persistence module induced by Rlog(V). Therefore the bottleneck distancebetween the respective persistence diagrams changes by at most log 1

s .

Remark 4.6. The proposition and corollary above are in appearance somehow not satisfactory:imagine that f = g, then δ = 0 but by invoking Proposition 4.7 instead of a 0-interleaving oneobtains a

(log(c(2s + 1)) + max(0, log 1

s ))-interleaving. Nevertheless, the claim of the proposition

is almost tight for δ > 0: in fact, by modifying the example in §4.3 for each δ > 0 we can find aspace Xδ, a (c, s)-good HFoCs U of a closed interval I ⊂ R, and a pair of functions fδ, gδ : X → Isuch that ‖fδ − gδ‖∞ = δ but such that the bottleneck distance between the persistence diagrams ofMM(U, fδ) and MM(U, gδ) is at least maxlog(cs), log c, log 1

s = Ω(

log(c(2s+ 1)) + max(0, log 1s )).

4.5 Stability in general

We now consider the more general case which also allows changes in the HFoCs inducing themultiscale mapper.

Theorem 4.10. Let U and V be any two η-interleaved (c, s)-good hierarchical families of coveringsof the compact connected metric space Z and let f, g : X → Z be any two continuous well-behavedfunctions such that maxx∈X dZ(f(x), g(x)) ≤ δ. Then, for ε0 = max(1, s), the log ε0-truncation ofRlog

(f∗(U)

)and Rlog

(g∗(V)

)are log

(2cmax(s, δ) + c+ η

)-interleaved.

Proof. Write Trε0(V) = Vεε≥ε0 and Trε0(U) = Uεε≥ε0 . Denote f∗(V) = Vfε ε≥ε0 , g∗(V) =

Vgε ε≥ε0 , and f∗(U) = Ufε ε≥ε0 , g∗(U) = Ugε ε≥ε0 . By following the argument and using the

notation in Proposition 4.7, for each ε ≥ ε0 one can find maps of coverings Ugε → Uf(2cδ′+c)ε and

Vfε → Vg(2cδ′+c)ε. Also, since Uf and Vf are η-interleaved, for each ε ≥ ε0 there are maps of coverings

Ufε → Vfε+η and Vgε → Ugε+η. Then, for each ε ≥ ε0 one can form the following diagram

Ufε −→ Ugε(2cδ′+c) −→ V

gε(2cδ′+c)+η → V

gε(2cδ′+c+η),

where the last step follows because since ε ≥ ε0 ≥ 1, then we have that ε(2cδ′ + c) + η ≤ε(2cδ′+c+η). Thus, by composing the maps intervening in the diagram above we have constructed

for any ε ≥ ε0 a map of covers Ufε −→ Vgε(2cδ′+c+η). In a similar manner one can construct a map

of covers Vgε −→ Ufε(2cδ′+c+η) for each ε ≥ ε0.This provides the maps Ufexp(ε) → Vg(2cδ′+c+η) exp(ε) and Vgexp(ε) → Uf(2cδ′+c+η) exp(ε) for each

ε ≥ ε0. Since (2cδ′+ c+ η) exp(ε) = exp(ε+ log(2cδ′+ c+ η)), by reindexing by log we obtain thatRlog

(f∗(Trε0(U))

)and Rlog

(g∗(Trε0(V))

)are log(2cδ′ + c+ η)-interleaved.

As an application of the previous theorem and the results of [3] we now precisely obtain:

Corollary 4.11. Let U and V be any two η-interleaved, (c, s)-good hierarchical p.f. families ofcoverings of the compact path connected metric space Z and let f, g : X → Z be any two well-behavedcontinuous functions such that maxx∈X dZ(f(x), g(x)) ≤ δ. Then, the bottleneck distance betweenDkMM

(Rlog(U), f

)and DkMM

(Rlog(V), g

)is bounded by 3

(log(2cmax(s, δ)+c+η

)+max(0, log 1

s ))

for all k ∈ N.

15

Page 16: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

5 Computing and approximating multiscale mapper

In this section, we present two results regarding the efficient computation and approximation ofthe multiscale mapper. Specifically, in §5.1, we show that both mapper and multiscale mapperfor piecewise-linear (PL) functions can be efficiently computed using only the underlying graphstructure.

In §5.2, we consider a general map to a compact metric space. We show that there is a practicaland even simpler “combinatorial” approach for approximating the multiscale mapper for the generalcase.

5.1 Piecewise-linear functions on simplicial domains

In practice, one of the most common types of input is a real-valued PL function f : |K| → R definedon the underlying space |K| of a simplicial complex K. That is, f is given at the vertex set V(K)of K, and linearly interpolated within any other simplex σ ∈ K. For example, such an input arisesnaturally as the Rips (or Cech) complex built from a set of points P , and function values at pointsin P induce a PL function on (the geometric realization of) this Rips complex. Alternatively, itcould be that we are given a topological space X with a triangulation K, and the PL function fon K approximates a continuous function on X. In what follows, we consider this PL setting, andshow that interestingly, if the input function satisfies a mild “minimum diameter” condition, thenwe can compute both mapper and multiscale mapper from simply the 1-skeleton (graph structure)of K. This makes the computation of the multiscale mapper from a PL function significantly fasterand simpler.

Specifically, given a simplicial complex K, let K1 denote the 1-skeleton of K: that is, K1

contains the set of vertices and edges of K. Define f : |K1| → R to be the restriction of f to |K1|;that is, f is the PL function on |K1| induced by function values defined at vertices.

For a given HFoCs W of a compact connected metric space (Z, dZ), we denote by κ(W) :=infdiam(W ); W ∈ W ∈ W the minimal diameter of any element of any cover of the family W.From Definition 4.1, it follows that κ(W) ≤ s whenever W is (c, s)-good.

Condition 5.1 (Minimum diameter condition.). Given a simplicial complex K with a functionf : |K| → Z and a hierarchical family of covers W of the continuous and compact metric space Z,we say that (K, f,W) satisfies the minimum diameter condition if diam(f(σ)) ≤ κ(W) for everysimplex σ ∈ K.

Remark 5.1. In our case, f is a PL function, and thus satisfying the minimum diameter conditionmeans that for every edge e = (u, v) ∈ K1, |f(u)− f(v)| ≤ κ(W). In what follows we assume thatK is connected. We do not lose any generality by this assumption because the arguments below canbe applied for each connected component of K.

Definition 5.1. Two hierarchical families of simplicial complexes S =Sε

sε,ε′−→ Sε′

and T =Tε

tε,ε′−→ Tε′

are isomorphic if:

• res(S) = res(T).

• For each ε ≥ res(S) there is a simplicial isomorphism ηε between Sε and Tε, and

• The simplicial maps sε,ε′ and tε,ε′ are identical up to these isomorphisms of their domainsand codomains: ηε′ sε,ε′ = tε,ε′ ηε for all res(S) ≤ ε ≤ ε′.

16

Page 17: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

Our main result in this section is the following theorem.

Theorem 5.2. Given a PL function f : |K| → R on a simplicial complex K and a hierarchicalfamily of covers W of the image of f with (K, f,W) satisfying the minimum diameter condition,the multiscale mapper for |K| is isomorphic to that for |K1|, that is, MM(W, f) is isomorphic toMM(W, f).

Proof. We show in Proposition 5.3 that the mapper outputs M(W, f) and M(W, f) are identicalup to a relabeling of their vertices (hence simplicially isomorphic) for every W ∈ W. Also, sincethe simplicial maps in the filtrations MM(W, f) and MM(W, f) are induced by the pullback ofthe same HFoC W, they are identical again up to the same relabeling of the vertices. This thenestablishes the theorem.

In what follows, for clarity of exposition, we use X and X1 to denote the underlying space |K|and |K1| of K and K1, respectively. Also, we do not distinguish between a simplex σ ∈ K andits image |σ| ⊆ X and thus freely say σ ⊆ X when it actually means that |σ| ⊆ X for a simplexσ ∈ K.

Proposition 5.3. If (K, f,W) satisfies the minimum diameter condition, then for every W ∈W,M(W, f) is identical to M(W, f) up to relabeling of the vertices.

Proof. Let U = f∗(W) and U = f∗(W). By definition of f , each U ∈ U is a connected componentof some U ∩X1 for some U ∈ U . In Proposition 5.5, we show that U ∩X1 is connected for everyU ∈ U . Therefore, for every element U ∈ U , there is a unique element U = U ∩X1 in U and viceversa. The following Claim 5.4 finishes the proof.

Claim 5.4.⋂ki=1 Ui 6= ∅ if and only if

⋂ki=1 Ui 6= ∅.

Proof. Clearly, if⋂ki=1 Ui 6= ∅, then

⋂ki=1 Ui 6= ∅ because Ui ⊆ Ui. For the converse, assume⋂k

i=1 Ui 6= ∅ and pick any x ∈⋂ki=1 Ui. Let σ ⊆ X be a simplex which contains x. Consider the

level set L = f−1(f(x)). Since f is PL, Lσ = L ∩ σ is connected and contains a point y ∈ σ ∩X1.Then, y ∈ Ui for each i ∈ 1, . . . , k because x and y are connected by a path in Ui ∩ σ andf(x) = f(y). Therefore, y ∈

⋂i Ui. Since y ∈ X1 it follows that y ∈

⋂i Ui.

Proposition 5.5. If (X, f,W) satisfies the minimum diameter condition, then for every W ∈ Wand every U ∈ f∗(W), the set U ∩X1 is connected.

Proof. Fix U ∈ f∗(W). If U ∩ X1 is not connected, let C1, . . . , Ck denote its k ≥ 2 connectedcomponents. First, we show that each Ci contains at least one vertex of X1. Let e = (u, v)be any edge of X1 that intersects U . If both ends u and v lie outside U , then |f(u) − f(v)| >|maxU f − minU f | ≥ κ(W). But, this violates the minimum diameter condition. Thus, at leastone vertex of e is contained in U . It immediately follows that Ci contains at least one vertex of X1.

Let ∆ be the set of all simplices σ ⊆ X so that σ ∩U 6= ∅. Fix σ ∈ ∆ and let x be any point inσ ∩ U .

Claim 5.6. There exists a point y in an edge of σ so that f(x) = f(y).

Proof. We first observe that V+(σ) := v ∈ V(σ)|f(v) ≥ f(x) is non-empty. Otherwise f(v) <f(x) for for all vertices v ∈ σ. In that case f(x′) < f(x) for all points x′ ∈ σ contradicting that xbelongs to σ. Now, if V+(σ) = V(σ), there is an edge e ⊆ σ so that x ∈ e and taking y = x servesthe purpose. If V+(σ) 6= V(σ), there is a vertex u ∈ V(σ) so that f(u) < f(x). Taking a vertexv from the non-empty set V+(σ), we get an edge e = (u, v) where f(u) < f(x) and f(v) ≥ f(x).Since f is PL, it follows that e has a point y so that f(y) = f(x).

17

Page 18: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

Since σ contains an edge e that is intersected by U , it contains a vertex of e that is containedin U . This means every simplex σ ∈ ∆ has a vertex contained in U . For each i = 1, . . . , k let∆i := σ ⊆ X |V(σ) ∩ Ci 6= ∅. Since every simplex σ ∈ ∆ has a vertex contained in U , we have∆ =

⋃i ∆i. We argue that the sets ∆1, . . . ,∆k are disjoint from each other. Otherwise, there exist

i 6= j and a simplex σ with a vertex u in ∆i and another vertex v in ∆j . Then, the edge (u, v) mustbe in U because f is PL. But, this contradicts that Ci and Cj are disjoint. This establishes thateach ∆i is disjoint from each other and hence ∆ is not connected contradicting that U is connected.Therefore, our initial assumption that U ∩X1 is disconnected is wrong.

Real-valued functions on triangulable topological space. A PL function f : |K| → R δ-approximates a continuous function g : X → R defined on a topological space X if there exists ahomeomorphism h : X → |K| such that for any point y ∈ |K|, we have that |f(y)− g h−1(y)| ≤ δ.The following result states that if a PL function δ-approximates a continuous real-valued functionon X, then the persistence diagrams induced by the respective multiscale mappers are also close.

Corollary 5.7. Let f : |K| → R be a piecewise-linear function on K that δ-approximates a contin-uous function g : X → R. Let W be a (c, s)-good hierarchical p.f. family of coverings of R. Then thebottleneck distance between the persistence diagrams DkMM

(Rlog(W), f

)and DkMM

(Rlog(W), g

)is at most 3

(log(2cmax(s, δ) + c

)+ max(0, log 1

s ))

for all k ∈ N.

Proof. Let g : |K| → R denote the push forward of g : X → R by the homeomorphism h : X → |K|.By the definition of δ-approximation, we know that ‖f − g‖∞ ≤ δ. Hence by Corollary 4.9, thebottleneck distance between the persistence diagrams DkMM

(Rlog(W), f

)and DkMM

(Rlog(W), g)

is at most 3(

log(2cmax(s, δ) + c) + max(0, log 1s ))

for all k ∈ N.On the other hand, since h is homeomorphism, it is easy to verify that the pullback of W via g

and via g induce isomorphic persistence modules: Hk(MM(Rlog(W), g))) ∼= Hk(MM(Rlog(W), g))).Combining this with the discussion in previous paragraph, the claim then follows.

5.2 Approximating multiscale mapper for general maps

While results in the previous section concern real-valued PL-functions, we now provide a significantgeneralization for the case of a map f : |K| → Z from the underlying space of K into an arbitrarycompact metric space Z. Specifically, we show that there is a “combinatorial” version of the(multiscale) mapper, such that each connected component in the pullback f−1(W ) of any W inthe covering of Z contain only vertices of K. Hence, the construction of the Nerve complex forthis modified (multiscale) mapper is purely combinatorial, simpler and more efficient to implement.But we lose the “exactness”, that is, the simpler mapper only “approximates” the actual multiscalemapper at the homology level. Also, it requires a (c, s)-good HFoCs of Z.

In practice, the most common type of input domain is a simplicial complex K. Given a mapf : |K| → Z defined on the underlying space |K| of K, in order to implement the mapper andmultiscale mapper constructions, one needs to compute f∗(W) for a covering W of the compactmetric space Z.

Specifically, for any W ∈ W one needs to compute the pre-image f−1(W ) ⊂ K and shatterit into connected components. Even in the setting adopted in §5, where we have a PL functionf : |K1| → R defined on the 1-skeleton K1 of K, the connected components in cc(f−1(W )) maycontain vertices, edges, and also partial edges: See Figure 3 for an example, where we have a PLfunction f on a graph G. Next we introduce a combinatorial version of both mapper and multiscalemapper, where each connected component in the pullback f−1(W ) will contain only vertices of K.

18

Page 19: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

W

f

G

f−1(W )

Figure 3: Partial thickened edges belong to the two connected components in f−1(W ). Note thateach set in ccG(f−1(W ) contains only the set of vertices of a component in cc(f−1(W )).

Combinatorial construction of mapper and multiscale mapper. Let G be a graph withvertex set V(G) and edge set E(G). Suppose we are given a map f : V(G) → Z and a finiteopen covering W = Wαα∈A of the metric space (Z, dZ). For any Wα ∈ W, consider its pre-image f−1(Wα) ⊆ V(G). In the standard mapper definition, one considers the set of connectedcomponents of f−1(Wα) for all α ∈ A and builds the nerve complex of the resulting pullback cover.

In the current setting, each f−1(Wα) will be regarded a set of disjoint points, but we willconsider the connectivity induced by the input graph G. We now formalize this:

Definition 5.8. Given a set of points O ⊆ V(G), the set of connected components of O inducedby the graph G, denoted by ccG(O), is the partition of O into maximal subset of points connectedin G. We refer to each such maximal subset of vertices as a G-induced connected component of O.We define f∗G(W), the G-induced pull-back via the function f , as the collection of all G-inducedconnected components ccG(f−1(Wα)) for all α ∈ A.

Definition 5.9. (G-induced multiscale mapper) Similar to the mapper construction, we define theG-induced mapper MG(W, f) as the nerve complex N(f∗G(W)).

Given a hierarchical family W = Wε of coverings of Z, we define the G-induced multiscalemapper MMG(W, f), from the filtration of G-induced nerve complexes N(f∗G(Wε)) | Wε ∈ Win the same way of defining multiscale mapper MM(W, f) from the filtration of nerve complexesN(f∗(Wε)) | Wε ∈W.

Remark 5.2. We note that if Z ⊂ R, then f induces a piecewise linear function f : |G| → Ron the graph G, when G is regarded as a 1-dimensional simplicial complex. Then it is easy toverify that each G-induced component in ccG(f−1(Wα)) corresponds to the vertex set of a connectedcomponent in cc(f−1(Wα). Hence compared to the components in cc(f−1(Wα)) which may containpartial edges, a G-induced component is simply a set of vertices, and much easier to handle. (SeeFigure 3 for an illustration.) Checking common intersection between multiple G-induced componentsis now a purely combinatorial task.

19

Page 20: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

We remark that the idea of a combinatorial mapper is not new: indeed, the implementation ofthe popular Mapper software [22] and the so-called α-Reeb graph in [4] both use this combinatorialidea.

Multiscale mapper versus combinatorial multiscale mapper. Given a map f : |K| → Zdefined on the underlying space |K| of a simplicial complex K, let fV : V(K) → R denote therestriction of f to the vertices of K. Consider the graph K1 as providing connectivity informationfor points in V(K).

Given any HFoCs W of the metric space Z, we can now compute both the standard multi-scale mapper MM(W, f) and the K1-induced multiscale mapper MMK1(W, fV ) (also called thecombinatorial multiscale mapper of f w.r.t. W).

Our main result of this section is that whenever W is a (c, s)-good HFoCs of Z, then the resultingtwo hierarchical families of simplicial complexes interleave in the sense of Definition 3.8 and as aconsequence, their respective persistence diagrams are at bounded distance from each other.

Thus, the combinatorial multiscale mapper is not only simpler to compute in practice, but itcan still provably approximate the persistence diagram of the multiscale mapper for a good familyof coverings. More precisely, we have the following main result.

Theorem 5.10. Assume that (Z, dZ) is a compact and connected metric space. Given a mapf : |K| → Z defined on the underlying space of a simplicial complex K, let fV : V(K) → Z be therestriction of f to the vertex set V(K) of K.

Given a (c, s)-good positive and p.f. HFoCs W of Z such that (K, f,W) satisfies the minimumdiameter condition (cf. Condition 5.1), the bottleneck distance between the persistence diagramsDkMM

(Rlog(W), f

)and DkMMK1

(Rlog(W), fV

)is at most 3 log(3c)+3 max(0, log 1

s ) for all k ∈ N.

The remainder of this section is devoted to proving Theorem 5.10.

In what follows, the input HFoCs W =Wε

wε,ε′−→ Wε′ε≤ε′ is (c, s)-good, and we set ρ = 3c.

To simplify notation, for each ε ≥ s let Mε denote the the nerve complex N(f∗(Wε)) and MK1ε

the K1-induced nerve complex (the combinatorial mapper) N(f∗K1V (Wε)).

Our goal now is to show that there exist maps φε and νε so that diagram-(A) below commutesat the homology level, which then implies that the two multiscale mappers MM(RlogW, f) andMMK1(RlogW, fV ) (weakly) log ρ-interleave, thus proving Theorem 5.10.

(A). Mε

φε ''

θε,ρε //Mρε

MK1ε

νε

OO

θK1ε,ρε

//MK1ρε

νρε

OO(B). Mε

φε ''

sε //Mρε

MK1ε

νε

OO

tε//MK1

ρε

νρε

OO(5.5)

In the diagram above, the map θε,ε′ := N(f∗(wε,ε′)) : Mε → Mε′ is the simplicial map induced by

the pullback of the map of coverings wε,ε′ :Wε →Wε′ . Similarly, the map θK1ε,ε′ := N(f

∗K1V (wε,ε′)) :

MK1ε → MK1

ε′ is the simplicial map induced by the pullback of the map of coverings wε,ε′ :Wε →Wε′ .

The maps νε. Now we define the map νε : MK1ε → Mε. Given a nerve complex Mε (resp.

MK1ε ), we abuse the notation slightly and identify a vertex in Mε with its corresponding connected

component in f∗(Wε) (resp. in f∗K1V (Wε)).

First, note that by construction of the combinatorial mapper, there is a natural map uW :ccK1(f−1V (W ))→ cc(f−1(W )) for any W ∈ Wε: For any V ∈ ccK1(f−1V (W )), obviously, all vertices

20

Page 21: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

in V are K1-connected in f−1(W ). Hence there exists a unique set UV ∈ cc(f−1(W )) such thatV ⊆ UV . We set uW (V ) = UV for each V ∈ ccK1(f−1V (W )). The amalgamation of such maps for all

W ∈ Wε gives rise to the map νε : f∗K1V (Wε) → f∗(Wε), whose restriction to each ccK1(f−1V (W ))

is simply uW as defined above. Abusing notation slightly, we use νε : V(MK1ε )→ V(Mε) to denote

the corresponding vertex map as well. Since for any set of vertices V ∈ f∗K1V (Wε), we have that

V ⊆ νε(V ) ⊆ |K|. It then follows that non-empty intersections of sets in f∗K1V (Wε) imply non-

empty intersections of their images via νε in f∗(Wε). Hence this vertex map induces a simplicialmap which we still denote by νε : MK1

ε → Mε.

Auxiliary maps of coverings µε. What remains is to define the maps φεs in diagram-(A). Todefine them, we need to first introduce the following map of coverings: µε : Wε → Wρε for anyε ≥ s. Specifically, given any W ∈ Wε, let W s denote the “thickening” of W in Z by s, that is,W s = x ∈ Z | dZ(x,W ) ≤ s. Since W is (c, s)-good, there exists at least one set W ′ ∈ Wρε suchthat W s ⊆W ′. This is because for any ε ≥ s:

s ≤ diam(W s) ≤ diam(W ) + 2s = ε+ 2s ≤ 3ε ⇒ c · diam(W s) ≤ 3cε = ρε.

We set µε(W ) = W ′: There may be multiple choices of sets in Wρε that contains W , we pick anarbitrary but fixed one.

Let sε : Mε → Mρε and tε : MK1ε → MK1

ρε denote the simplicial map induced by the pullbacks ofthe covering map µε : Wε → Wρε via f and via fV , respectively. In other words, sε = N(f∗(µε))

and tε = N(f∗K1V (µε)).

The maps φε. We now define the map φε : Mε → MK1ρε with the help of diagram-(B) in Eqn

(5.5).Fix W ∈ Wε. Given any set U ∈ cc(f−1(W )), write Vββ∈AU for the preimage ν−1ε (U) of U

under the vertex map νε.Note that we have that

Vβ ⊆ U for any β ∈ AU , and⋃

β∈AU

Vβ = V(K) ∩ U. (5.6)

We claim that tε(Vβ) = tε(Vβ′) for any β, β′ ∈ AU .Indeed, since Vβ and Vβ′ are contained in the path connected component U , let π(x, y) ⊆ U

be any path connecting a vertex x ∈ Vβ and y ∈ Vβ′ in U . Let σ1, . . . , σa be the collection ofsimplices that intersect π(x, y). By the minimum diameter condition (cf. Condition 5.1) and thedefinition of the map µε, we have that

f(σj) ⊆W s ⊆ µε(W ) ∈ Wρε, for any j ∈ 1, . . . , a.

Hence Vβ ∪ Vβ′ ∪ σ1, . . . , σa ⊆ sε,ρε(U) ∈ cc(f−1(µε(W ))). Furthermore, since the edges ofsimplices σ1, . . . , σa connect x ∈ Vβ and y ∈ Vβ′ , vertices in Vβ and Vβ′ are thus connectedby edges in sε(U). That is, there exists a K1-induced component (a subset of vertices of K)V ∈ ccK1(f−1V (µε(W )))) such that Vβ ∪ Vβ′ ⊆ V ′ for any β, β′ ∈ AU . Hence, all Vβs with β ∈ AUhave the same image V ′ in V(MK1

ρε ) under the map tε, and we set φε(U) := V ′ to be this commonimage. Note that by Eqn (5.6), V(K) ∩ U ⊆ φε(U). In fact, the above argument can also be usedto prove the following:

Claim 5.11. The vertices of any simplex that intersects U will be contained in φε(U).

21

Page 22: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

Lemma 5.12. The vertex map φε introduced above induces a simplicial map which we also denoteby φε : Mε → MK1

ρε .

Proof. We need to show the following: given a k-simplex τ = p0, p1, . . . , pk ∈ Mε, where eachvertex pj corresponds to set Uj ⊆ |K|, then we have that

⋂kj=0 φε(Uj) 6= ∅.

To prove this, take any point x ∈ |K| such that x ∈⋂kj=0 Uj . Suppose x is contained in a

simplex σ ∈ K. By Claim 5.11, the vertices of σ are contained in φε(Uj), that is, V(σ) ⊂ φε(Uj),

for any j ∈ 0, . . . , k. Hence⋂kj=0 φε(Uj) ⊇ V(σ) 6= ∅. The lemma then follows.

Finally, by construction of φε, the lower triangle in diagram-(B) in Eqn (5.5) commutes. This,combined with the fact that the square in diagram-(B) commutes, also implies that the top trianglecommutes. Furthermore, Lemma 2.5 states the two maps of covers f∗(wε,ρε) and f∗(µε) will inducecontiguous simplicial maps θε,ρε and sε. Similarly, θK1

ε,ρε is contiguous to tε. Since contiguous mapsinduce the same map at the homology level, it then follows that diagram-(A) in Eqn (5.5) commutesat the homology level. Using the Weak Stability Theorem of [3], as well as an argument similar tothe proof of Corollary 4.9, Theorem 5.10 then follows.

6 The metric point of view

The multiscale mapper construction from previous sections yields a hierarchical family of simplicialcomplexes, and a subsequent application of the homology functor with field coefficients provides apersistence module and hence a persistence diagram.

An alternative idea is to use the data (U, f : X → Z) to induce a metric on X, and thenconsider the Rips or Cech filtrations arising from that metric, and in turn obtain a persistencediagram. Both viewpoints are valid in that they both produce topological summaries out of thegiven data. The first approach proceeds at the level of filtrations and then simplicial maps, andthen persistent homology, whereas the other readily produces a metric on X and then followsthe standard metric point of view with geometric complexes. Notice that these two approachesyield different computational problems: the MM approach leads to persistence under simplicialmaps [11], and the metric approach leads to persistence under inclusion maps, which appears to becomputationally easier with state-of-the-art techniques.

6.1 The pull-back pseudo-metric

Assume that a continuous function f : X → Z and a hierarchical family of coverings U withres(U) ≥ 0 of Z are given. We wish to interpret ε as the size of each U ∈ Uε at least for ε ≥ s whenU is a (c, s)-good cover. We define the function dU,f : X ×X → R+ for x, x′ ∈ X s.t. x 6= x′ by

dU,f (x, x′) := infε > 0| ∃V ∈ f∗(Uε) with x, x′ ∈ V

, (6.7)

and by dU,f (x, x) := 0 for all x ∈ X. Notice that dU,f (x, x′) ≥ s whenever x 6= x′. We refer to dU,fas the pull-back pseudo-metric.

Lemma 6.1. If U is a (c, s)-good hierarchical family of covers of the compact connected metricspace Z, then dU,f satisfies:

• dU,f (x, x′) = dU,f (x′, x) for all x, x′ ∈ X.

• dU,f (x, x′) ≤ c ·(dU,f (x, x′′) + dU,f (x′′, x′) + 2s

)for all x, x′, x′′ ∈ X.

22

Page 23: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

Proof of Lemma 6.1. That the definition yields a symmetric function is clear. The second propertyfollows from the c-goodness of U and Proposition 4.2. Indeed, assume that V1 ∈ f∗(Uε1) is s.t.x, x′ ∈ V1 and V2 ∈ f∗(Uε2) is s.t. x′, x′′ ∈ V2 . Then there exists U1 ∈ Uε1 s.t. V1 ∈ cc(f−1(U1)).Similarly, there exists U2 ∈ Uε2 s.t. V2 ∈ cc(f−1(U2)). Clearly, V1 ∩ V2 6= ∅, so that U1 ∩ U2 6= ∅.Then, diam(U1 ∪ U2

)≤ ε1 + ε2. The proof concludes via Proposition 4.2.

Remark 6.1. Note that despite the fact that we call dU,f a pseudo-metric, it does not satisfy thetriangle inequality in a strict sense. According to the lemma above it does, however, satisfy a relaxedversion of such inequality. Furthermore, in the ideal case when c = 1 and s = 0, dU,f satisfies thetriangle inequality. Thus, in what follows we will take the liberty of calling it a pseudo-metric.

Notice that this relaxed triangle inequality does not preclude our ability to consider the inducedCech complex and to develop the interleaving results in Section 6.4.

6.2 Stability of dU,f

In the same manner that we established the stability of the multiscale mapper construction, onecan answer what is the stability picture for the Cech construction over (X, dU,f ). A first step is tounderstand how dU,f changes when we alter both the function f and the hierarchical family U.

Proposition 6.2 (Stability of pull-back pseudo-metric under cover perturbations). Let f : X → Zbe a continuous function and U and V be two η-interleaved positive hierarchical coverings of Z.Then,

∀x, x′ ∈ X, dU,f (x, x′) ≤ dV,f (x, x′) + η.

Proof. Assume the hypothesis. Then, by Lemma 3.7 f∗(U) and f∗(V) are η-interleaved as well.Pick x, x′ ∈ X and assume that dU,f (x, x′) < ε. Let U ∈ f∗(Uε) be such that x, x′ ∈ U . Then, onecan find V ∈ Vε+η such that U ⊂ V . Then, since x, x′ ∈ V , it follows that

dV,f (x, x′) ≤ ε+ η.

Since this holds for any ε > dU,f (x, x′), we obtain that dU,f (x, x′) ≤ dV,f (x, x′) + η.

Proposition 6.3 (Stability of pull-back pseudo-metric against function perturbation). Let U be a(c, s)-good hierarchical family of covers of the compact connected metric space Z and let f, g : X → Zbe two continuous functions such that for some δ ≥ 0 one has maxx∈X dZ(f(x), g(x)) ≤ δ. Then,

∀x, x′ ∈ X, dU,g(x, x′) ≤ c(dU,f (x, x′) + 2 max(δ, s)

).

Proof. Write U = Uε. Let dU,f (x, x′) < ε. Then, x and x′ are in V for some V ∈ cc(f−1(U)) forsome U ∈ Uε. Write δ′ = max(δ, s). We know f−1(U) ⊆ g−1(U δ

′). Since s ≤ diam(U δ

′) ≤ ε + 2δ′

and U is a (c, s)-good cover of Z, there exists U ′ ∈ Uc(ε+2δ′) so that U δ′ ⊂ U ′. Thus, V ⊆ g−1(U ′).

It follows by definition that dU,g(x, x′) ≤ c · (ε + 2δ′). Since the argument holds for all ε > 0, we

have the result.

As a corollary of the two preceding propositions we obtain the following statement:

Proposition 6.4. Let ε0, c ≥ 1 and η, s > 0. Let U and V be any two η-interleaved, ε0-truncationsof (c, s)-good hierarchical families of coverings of the compact connected metric space Z and letf, g : X → Z be any two continuous functions such that maxx∈X dZ(f(x), g(x)) ≤ δ. Then, for allx, x′ ∈ X γ−1 · dV,f (x, x′) ≤ dU,g(x, x′) ≤ γ · dV,f (x, x′), where γ :=

(2cmax(s, δ) + c+ η

).

23

Page 24: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

6.3 Cech filtrations using dU,f

Given any point x ∈ X, we denote the ε-ball around x by Bε(x) = x′ ∈ X | dU,f (x, x′) ≤ ε. Wehave the following observation.

Lemma 6.5. Let f : X → Z be continuous and U any positive hierarchical family of covers of Z.Consider the pseudo-metric space (X, dU,f ). Then, for any x ∈ X and ε ≥ 0,

Bε(x) =⋃

V ∈f∗(Uε), s.t. x ∈ VV.

Proof. By definition, Bε(x) =⋃δ≤εV ∈ f∗(Uδ); x ∈ V . Now, since the family U is hierarchical

(cf. Definition 3.1), then whenever δ ≤ ε and V ∈ f∗(Uδ) one will also have that V ∈ f∗(Uε), sothat the conclusion follows.

The fact that as defined dU,f is a non-negative symmetric function on X ×X permits definingthe Cech (and also Rips) filtration on the powerset pow(X) of X. The Cech filtration is given bythe function FC : pow(X)→ R where for each σ ⊂ X, FC(σ) := infx∈X supx′∈σ dU,f (x, x′).

Corollary 6.6. Under the hypothesis of Theorem 6.4, Cech(Rlog(U), f

)and Cech

(Rlog(V), g

)are

log(2cmax(s, δ) + c+ η

)-interleaved.

Because of the above corollary, the persistence diagrams arising from the Cech filtration on Xare stable in the bottleneck distance.

6.4 An interleaving between MM(U, f) and Cech(U, f).

Interestingly, it turns out that these two views: the pullback of coverings and an induced pullbackmetric, are closely related. Specifically, by putting together the elements discussed in this sectionwe can state a theorem specifying a comparability between MM(U, f) and Cech(U, f) under thelog-reindexing.

Theorem 6.7. Let (Z, dZ) be a compact connected metric space, U be a (c, s)-good HFoCs of Zwith s ≥ 1, and f : X → Z be continuous. Then, the hierarchical families of simplicial complexesMM

(RlogU, f

)and Cech

(RlogU, f

)are log(c(s+ 2))-interleaved.

Corollary 6.8. Under the hypotheses of the previous theorem, the persistence diagrams of MM(RlogU, f

)and Cech

(RlogU, f

)are at bottleneck distance bounded by log(c(s+ 2)).

Proof of Theorem 6.7. To fix notation, write U = Uε, MM(U, f) = Mε

sε,ε′−→ Mε′s≤ε≤ε′ , and

Cech(U, f) = Cε

ιε,ε′−→ Cε′s≤ε≤ε′ for the multiscale mapper and Cech filtrations, respectively.For each ε ≥ s we will define simplicial maps πε : Cε → Mc(2ε+s) and ψε : Mc(2ε+s) → Cε so

that in the following diagram πε ψε and sε,c(2ε+s) are contiguous, and πε ψc(2ε+s) and ιε,c(2ε+s)are also contiguous:

ψε

sε,c(2ε+s) //Mc(2ε+s)

ψc(2ε+s)

πε77

ιε,c(2ε+s) // Cc(2ε+s)

(6.8)

Recall that the vertex set of Cε is X, whereas the vertex set of Mε is the cover f∗(Uε).

24

Page 25: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

The map πε : Cε → Mc(2ε+s). Consider the map πε : X → f∗(Uc(2ε+s)), x 7→ Vx ∈ f∗(Uc(2ε+s))such that Bε(x) ⊆ Vx. Such a Vx always exists. Indeed, note first that f(Bε(x)) = U ∈ Uε| f(x) ∈U. Then, it follows that diam(f(Bε(x))) ≤ 2ε. Now, invoke the fact that U is (c, s)-good andProposition 4.2 to conclude the existence of U ∈ Uc(2ε+s) such that f(Bε(x)) ⊂ U . Finally, pickVx ∈ cc(f−1(U)) s.t. Bε(x) ⊂ Vx.

The map πε induces the simplicial map πε shown in diagram (6.8). Indeed, assume σ =x0, . . . , xk ∈ Cε. Then,

⋂iBε(xi) 6= ∅. Since by construction πε(xi) = Vxi ⊇ Bε(xi) for all i, it

follows that⋂i πε(xi) 6= ∅ as well.

The map ψε : Mε → Cε. For all V ∈ f∗(Uε) pick a point xV ∈ V : This is a choice of arepresentative for each element of the pullback cover. Define ψε(V ) = xV . We now check that thisvertex map induces a simplicial map ψε. Assume that V0, V1, . . . , Vk ∈ f∗(Uε) are s.t.

⋂i Vi 6= ∅.

Note that by Lemma 6.5, each Vi satisfies Vi ⊆ Bε(xVi). It then follows that⋂iBε(xVi) 6= ∅,

implying that xV0 , . . . , xVk span a simplex in Cε.

Claim 6.9. The maps sε,c(2ε+s) and πε ψε in (6.8) are contiguous.

Proof. Assume that V0, V1, . . . , Vk ∈ f∗(Uε) are such that ∩iVi 6= ∅, and let Ui := sε,c(2ε+s)(Vi) foreach i. Since Vi ⊆ Ui, we have that ∩iVi ⊆ ∩iUi. On the other hand, let xVi := ψε(Vi) (i.e, xVi isthe representative of the set Vi). Note that Wi := πε(xVi) satisfies Wi ⊇ Bε(xVi) ⊇ Vi. We thenhave that ∩iVi ⊆ ∩iUi

⋂∩jWj . Thus ∩iUi

⋂∩jWj 6= ∅, implying that Ui∪Wi spans a simplex

in Mc(2ε+s). Hence sε,c(2ε+s) and πε ψε are contiguous.

Claim 6.10. The maps ιε,c(2ε+s) and ψc(2ε+s) πε in (6.8) are contiguous.

Proof. Specifically, consider σ = x0, . . . , xk ∈ Cε, and let Vi := πε(xi) for each i ∈ 0, . . . , k. Bydefinition of πε, we have that Bε(xi) ⊆ Vi for i ∈ 0, . . . , k.

Let xi := ψc(2ε+s)(Vi) for each i ∈ 0, . . . , k. Then, since Vi belongs to f∗(Uc(2ε+s)), bydefinition of xi and Lemma 6.5, we see that Vi ⊂ Bc(2ε+s)(xi), for each i ∈ 0, . . . , k , and thus

Bε(xi) ⊂ Bc(2ε+s)(xi) for i ∈ 0, . . . , k.

On the other hand one trivially has Bε(xi) ⊂ Bc(2ε+s)(xi) for i ∈ 0, . . . , k, so that thenBε(xi) ⊂ Bc(2ε+s)(xi) ∩Bc(2ε+s)(xi) for i ∈ 0, . . . , k. Thus,

∅ 6= ∩iBε(xi) ⊆⋂i

Bc(2ε+s)(xi) ∩Bc(2ε+s)(xi).

Hence vertices in ιε,c(2ε+s)(σ)∪ψc(2ε+s) πε(σ) span a simplex in Cc(2ε+s) establishing that thetwo maps ιε,c(2ε+s) and ψc(2ε+s) πε are contiguous.

Notice that since c, ε ≥ 1, then diagram (6.8) can be substituted by a diagram where the mapsπε have domain Cε and codomain Mε(c(s+2)). The rest of the proof ofTheorem 6.7 follows stepssimilar to those in the proof of Proposition 4.7.

7 Discussion

In this paper, we proposed, as well as studied theoretical and computionals aspects of, multiscalemapper, a construction which produces a multiscale summary of a map on a domain using a coveringof the codomain at different scales.

25

Page 26: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

Given that in practice, hidden domains are often approximated by a set of discrete point samples,an important future direction will be to investigate how to approximate the multiscale mapper ofa map f : X → Z on the metric space X from a finite set of samples P lying on or around X aswell as function values f : P → Z at points in P . It will be particularly interesting to be able tohandle noise both in point samples P and in the observed values of the function f .

We also note that the multiscale mapper framework can be potentially extended to a zigzag-family of coverings, which will further increase the information encoded in the resulting summary. Itwill be interesting to study the theoretical properties of such a zigzag version of mapper. Its stabilityhowever appears challenging, given that the theory of stability of zigzag persistence modules is muchless developed than that for standard persistence modules. Finally, another interesting questionis to understand the continuous object that the mapper converges to as the scale of the coveringtends to zero. In particular, does the mapper converge to the Reeb space [14]?

Finally, it seems of interest to understand the features of a topological space which are capturedby Multiscale Mapper and its variants. Some related work in this direction in the context of Reebgraphs was reported in [10] and [9].

Acknowledgment. This work is partially supported by the National Science Foundation undergrants CCF-1064416, CCF-1116258, IIS-1422400, CCF-1319406, and CCF-1318595.

References

[1] D. Burghelea, and T. K. Dey. Topological persistence for circle-valued maps. Discrete & Com-putational Geometry 50.1 (2013), 69–98.

[2] G. Carlsson and A. Zomorodian. The theory of multidimensional persistence. Discrete & Com-putational Geometry 42.1 (2009), 71-93.

[3] F. Chazal, D. Cohen-Steiner, M. Glisse, L. Guibas, and S. Oudot. Proximity of persistencemodules and their diagrams. Proc. 25th Annu. Sympos. Comput. Geom. (2009), 237–246.

[4] F. Chazal and J. Sun. Gromov-Hausdorff approximation of filament structure using Reeb-typegraph. Proc. 30th Annu. Sympos. Comput. Geom. (SoCG) (2014), 491–500.

[5] C. Chen and M. Kerber. An output-sensitive algorithm for persistent homology. Proc. 27thAnnu. Sympos. Comput. Geom. (2011), 207–216.

[6] G. Carlsson and V. de Silva. Zigzag persistence. Found. Comput. Math., 10(4), 367–405, 2010.

[7] D. Cohen-Steiner, H. Edelsbrunner, and J. L. Harer. Stability of persistence diagrams. Disc.Comput. Geom. 37 (2007), 103-120.

[8] D. Cohen-Steiner, H. Edelsbrunner, J. Harer. Extending persistence using Poincare and Lef-schetz duality. Found. of Comp. Math. 9.1 (2009): 79-103.

[9] J. Curry. Personal communication.

[10] V. de Silva, E. Munch, and A. Patel. Categorified Reeb graphs. ArXiv preprintarXiv:1501.04147, (2015).

[11] T. K. Dey, F. Fan, and Y. Wang. Computing topological persistence for simplicial maps. Proc.30th Annu. Sympos. Comput. Geom. (SoCG) (2014), 345–354.

26

Page 27: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

[12] V. de Silva, D. Morozov, and M. Vejdemo-Johansson. Persistent cohomology and circularcoordinates. Discrete Comput. Geom. 45 (4) (2011), 737–759.

[13] H. Edelsbrunner and J. Harer. Computational Topology: An Introduction. Amer. Math. Soc.,Providence, Rhode Island, 2009.

[14] H. Edelsbrunner, J. Harer, and A. K. Patel. 2008. Reeb spaces of piecewise linear mappings.In Proceedings of the twenty-fourth annual symposium on Computational geometry (SCG ’08).ACM, New York, NY, USA, 242-250.

[15] H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological persistence and simplification.Discrete Comput. Geom. 28 (2002), 511–533.

[16] P. Frosini. A distance for similarity classes of submanifolds of a Euclidean space, Bulletin ofthe Australian Mathematical Society. 42(3) (1990), 407–416.

[17] S. Har-Peled and M. Mendel. Fast construction of nets in low dimensional metrics, and theirapplications. SIAM Journal on Computing. 35(5) (2006), 1148–1184.

[18] P. Y. Lum et al. Extracting insights from the shape of complex data using topology. Scientificreports 3 (2013).

[19] Munkres, J.R., Topology, Prentice-Hall, Inc., New Jersey, 2000.

[20] N., Monica, A. J. Levine, and G. Carlsson. Topology based data analysis identifies a subgroupof breast cancers with a unique mutational profile and excellent survival. Proceedings of theNational Academy of Sciences 108.17 (2011): 7265-7270.

[21] V. Robins. Towards computing homology from finite approximations. Topology Proceedings.24(1). 1999.

[22] G. Singh, F. Memoli, and G. Carlsson. Topological Methods for the Analysis of High Dimen-sional Data Sets and 3D Object Recognition. Sympos. Point Based Graphics, 2007.

[23] A. Zomorodian and G. Carlsson. Computing persistent homology. Discrete Comput. Geom. 33(2005), 249–274.

27

Page 28: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

A Constructing a good family of covers

Suppose we are given a compact metric space (Z, dZ) with bounded doubling dimension. We assumethat we can obtain an ν-sample P of (Z, dZ), which is a discrete set of points P ⊂ Z such that forany point z ∈ Z, dZ(z, P ) ≤ ν. For example, if the input metric space Z is a d-dimensional cubein the Euclidean space Rd, we can simply choose P to be the set of vertices from a d-dimensionallattice with edge length ν. For simplicity, assume that ν ≤ 1 (otherwise, we can rescale the metricto make this hold).

A.1 A simple (c, s)-good hierarchical family of coverings.

Consider the following hierarchical family of coverings W = Wε | ε ≥ 2ν where Wε := B ε2(u) |

u ∈ P. The associated maps of coverings wε,ε′ :Wε →Wε′ simply sends each element B ε2(u) ∈ Wε

to the corresponding set B ε′2

(u) ∈ Wε′ . It is easy to see that W is (3, 2ν)-good. Indeed, given any

O ⊆ Z with diameter R = diam(O) ≥ s = 2ν, pick an arbitrary point o ∈ O and let u ∈ P be anearest neighbor of o in P . We then have that for any point x ∈ O,

dZ(x, u) ≤ dZ(x, o) + dZ(o, u) ≤ R+ ν ≤ 3R

2.

That is, O ⊂ B 3R2

(u) ∈ W3R.

This hierarchical family of coverings however has large size. In particular, as the scale ε becomeslarge, the number of elements inWε remains the same, while intuitively, a much smaller subset willbe sufficient to cover Z. In what follows, we describe a different construction of a good hierarchicalfamily of coverings based on using the so-called nets.

A.2 A space-efficient (c, s)-good hierarchical family of coverings.

Following the notations used by Har-Peled and Mendel in [17], we have the following:

Proposition A.1 ([17]). For any constant ρ ≥ 11, and for any scale ` ∈ R+, one can compute aρ`-net N (`) ⊆ P in the sense that

(i) for any p ∈ P , dZ(p,N (`)) ≤ ρ`;(ii) any two points u, v ∈ N (`), dZ(u, v) ≥ ρ`−1/16, and

(iii) N (`′) ⊆ N (`) for any ` < `′; that is, the net at a bigger scale is a subset of net at a smallerscale.

Each N (`) is referred to as a net at scale `.

From now on, we set εi := 4(ρ+ 1)i for any positive integer i ∈ Z+. Consider the collection ofnets N (i) | i ∈ Z+ where N (i) is as described in Proposition A.1.

Definition A.2. We define a hierarchical family of coverings U = Uεi | i ∈ Z+ where:

Uεi :=

B εi2

(u) | u ∈ N (i), where recall that εi = 4(ρ+ 1)i. (A.9)

The associated maps of coverings uεi,εj : Uεi → Uεj , for any i, j ∈ Z+ with i 6= j, are defined asfollows:

(1) uεi,εi+1 : Uεi → Uεi+1: For any element U = B εi2

(u) ∈ Uεi, we find the nearest neighbor

v ∈ N (i+ 1) of u in N (i+ 1), and map U to V = B εi+12

(v) ∈ Uεi+1.

28

Page 29: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

(2) For i < j − 1, we set uεi,εj as the concatenation uεj−1,εj uεj−2,εj−1 · · · uεi,εi+1.

Remark A.1. We note that the above HFoCs U is discrete in the sense that we only considercoverings Uε for a discrete set of εs from εi | i ∈ Z+. However, one can easily extend it toa HFoCs Uext = Uext

δ which is defined for all δ ∈ R+ by declaring that Uextδ = Uεζ(δ) where

ζ(δ) = maxi ∈ Z| εi ≤ δ. One may define the cover maps in Uext in a similar manner. Forsimplicity of exposition, in what follows we use a discrete HFoCs.

Claim A.3. For any i ∈ Z+, Uεi forms a covering of the metric space Z (where P are sampledfrom).

Proof. P is an ν-sampling of (Z, dZ) with ν ≤ 1. Hence for any point x ∈ Z, there is a pointpx ∈ P such that d(x, px) ≤ ε. By Property (i) of Definition A.1, dZ(px,N (i)) ≤ ρi. HencedZ(x,N (i)) ≤ dZ(x, px) + dZ(px,N (i)) ≤ 1 + ρi ≤ (ρ + 1)i. As such, there exists u ∈ N (i) suchthat dZ(x, u) ≤ (ρ+ 1)i; that is, x ∈ U = B εi

2(u) ∈ Uεi .

Claim A.4. For any i < j and any U ∈ Uεi, we have U ⊆ uεi,εj (U).

Proof. We show that the claim holds true for j = i + 1, and the claim then follows from this andthe construction of uεi,εj for i < j − 1.

Suppose U = B εi2

(u) for u ∈ N (i), and let V = uεi,εi+1(U). By construction, V = B εi+12

(v) is

such that v is the nearest neighbor of u in N (i+ 1). If u = v, then clearly U ⊆ V .If u 6= v, by Property (i) of Definition A.1, dZ(u, v) ≤ ρi+1. At the same time, any point x ∈ U

is within εi2 = 2(ρ+ 1)i distance to u. Hence

dZ(x, v) ≤ dZ(x, u) + dZ(u, v) ≤ 2(ρ+ 1)i + ρi+1 < 2(ρ+ 1)i+1.

It then follows that U ⊆ V .

Theorem A.5. U as constructed above is a (c, s)-good hierarchical family of coverings with c =s = 4(ρ+ 1).

Proof. First, note that by Claim A.3, each Uε ∈ U is indeed a cover for (Z, dZ). By Claim A.4, theassociated maps as constructed in Definition A.2 are valid maps of coverings. Hence U is indeed ahierarchical family of coverings for (Z, dZ). We now show that U is (c, s)-good.

First, by Eqn. A.9, each set U in Uεi obviously has diameter at most εi. Also, U is s-truncatedfor s = 4(ρ+ 1) since it starts with ε1 = s. Hence properties 1 and 2 of Definition 4.1 hold. Whatremains is to show that property 3 of Definition 4.1 also holds.

Specifically, consider any O ⊂ Z such that diam(O) ≥ s. Set R = diam(O) and let a be theunique integer such that

(ρ+ 1)a−1 ≤ R < (ρ+ 1)a.

Let o ∈ O be any point in O, and p ∈ P the nearest neighbor of o in P : By the sampling conditionof P , we have dZ(o, p) ≤ ν < 1. Let u ∈ N (a) be the nearest neighbor of p in the net N (a). ByDefinition A.1 (i), dZ(p, u) ≤ ρa. We thus have that, for any point x ∈ O,

dZ(x, u) ≤ dZ(x, o) + dZ(o, p) + dZ(p, u) ≤ R+ ε+ ρa < (ρ+ 1)a + 1 + ρa ≤ 2(ρ+ 1)a.

In other words, O ⊆ B2(ρ+1)a(u) = B εa2

(u) ∈ Uεa .

On the other hand, since R ≥ (ρ+ 1)a−1, we have that for c = 4(ρ+ 1),

εa = 4(ρ+ 1)a = c · (ρ+ 1)a−1 ≤ cR = c · diam(O).

29

Page 30: Tamal K. Dey, Facundo M emoli March 31, 2019 arXiv:1504 ... · Tamal K. Dey, Facundo M emoliy, Yusu Wang March 31, 2019 Abstract Summarizing topological information from datasets

Space-efficiency of U. In comparison with the simple (3, 2ρ)-good hierarchical family ofcoverings W that we introduced at the beginning of this section, the main advantage of U is itsmuch more compact size. Intuitively, this comes from property (ii) of Proposition A.2, which statesthat the points in the net N (i) are sparse and contain little redundancy. In fact, consider thecovering Uεi at scale εi. It size (i.e, the cardinality of Uεi) is close to optimal in the following sense:

For any ε, denote by V∗(ε) the smallest possible (in terms of cardinality) covering of (Z, dZ)such that each element V ∈ V∗(ε) has diameter at most ε. Now, let s∗(ε) := |V∗(ε)| denote thisoptimal size for any covering of (Z, dZ) by elements with diameter at most ε.

Proposition A.6. For any i ∈ Z+, s∗(εi) ≤ |Uεi | = |N (i)| ≤ s∗( εi16ρ).

Proof. The left inequality follows from the definition of s∗(εi). We now prove the right inequality.Specifically, consider the smallest covering V∗ = V∗( εi

16ρ) with s∗( εi16ρ) = |V∗|. By property (ii) of

Proposition A.2, each set V ∈ V∗ can contain at most one point from N (i) since the diameter ofV is at most εi

16ρ . At the same time, we know that the union of all sets in V∗ will cover all pointsin N (i). The right inequality then follows.

30