Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise...

Statistics for a Computational TopologistPart I

Brittany Terese FasyTA: Samuel A. Micka

School of Computing and Dept. of Mathematical SciencesMontana State University

August 14, 2018

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 1 / 25

Why Topological Data Analysis?“Data has shape and the shape matters.” - Gunnar Carlsson

Today, Data is high-dimensional,

HUGE, present everywhere

———————Nicolaua, Levine, andCarlsson, PNAS 2011

———————http://astrobites.com/ ———————

www.mapconstruction.org

... and needs to be summarized, analyzed, and compared!

Today, Data is high-dimensional, HUGE,

present everywhere

———————http://astrobites.com/

———————www.mapconstruction.org

Today, Data is high-dimensional, HUGE, present everywhere

———————http://astrobites.com/ ———————

www.mapconstruction.org

What questions do we ask in data anaylsis?

Think! Write down one question (2 min)

Pair! Share with partner, and add more questions to your list (5 min)

Share! Raise hands please! (5 min)

More ideas? mickas37@gmail.com

Data Analysis Questions

Summarize and Analyze

What is this shape?

How many components / populations?

Can we categorize? (Classification)

What are the parameters? (Inference: Point Estimation)

How far do parameters likely lie from estimates? (Confidence Sets)

Compare

Are these the same? In distribution?

Has something changed? If so, what has changed?

Which is bigger?

Can we retain the null hypothesis? (Inference: Hypothesis Testing)

What is the relationship between X and Y ? (Regression)

What is this shape? How many components / populations?

Compare

Which is bigger?

Compare

Which is bigger?

Compare

Which is bigger?

Compare

Which is bigger?

Compare

Are these the same?

In distribution?

Which is bigger?

Compare

Which is bigger?

Compare

Has something changed?

If so, what has changed?

Which is bigger?

Compare

Which is bigger?

Compare

Which is bigger?

Compare

Which is bigger?

Compare

Which is bigger?

Most Important Questions

1. Which descriptor best captures our data?

Descriptors

Confidence Sets

2. How do we measure distance between descriptors?

Distances

Clustering

Descriptors

Confidence Sets

Distances

Clustering

Descriptors

Confidence Sets

Distances

Clustering

Descriptors

Confidence Sets

Distances

Clustering

Descriptors

Confidence Sets

Distances

Clustering

Descriptors

Topological Descriptors

Descriptors

Stat Reverences

Wasserman. All of Statistics: a Concise Course in Statistical Inference.Springer, 2010.

Givens and Hoeting. Computational Statistics. Wiley, 2013.

Descriptors

Stat Slide: The Basics

Let F be a probability distribution with density f .

X ∼ F reads “X has distribution F”.

Here, X is called a random variable.

Expectation: E(X ) =∫x dF (x).

Quantile Function CDF−1(q).

−4 −2 0 2 4

0.00.1

0.20.3

−4 −2 0 2 4

0.00.2

0.40.6

0.81.0

Descriptors

−4 −2 0 2 4

0.00.1

0.20.3

−4 −2 0 2 4

0.00.2

0.40.6

0.81.0

Descriptors

−4 −2 0 2 4

0.00.1

0.20.3

−4 −2 0 2 4

0.00.2

0.40.6

0.81.0

Descriptors

−4 −2 0 2 4

0.00.1

0.20.3

−4 −2 0 2 4

0.00.2

0.40.6

0.81.0

Descriptors

Prob/Stat Slide: Descriptors and Limit Theory

Let F be some distribution.

Let X1,X2, . . . ,Xn ∼ F . (The data).

A statistic or descriptor is a function of the data:T (X1,X2, . . . ,Xn) or T (X n).

Sample average: X n = 1n

∑Xi .

Law of Large Numbers

X n converges to E(Xi ) in probability:

∀ε > 0, limn→∞

(|P(X n − E(Xi )| > ε))→ 0.

Central Limit Theorem√n(X n − E(Xi )) converges in distribution to a Normal distribution, i.e.,

sample average is approximately Normal for large enough samples.

Descriptors

∑Xi .

∀ε > 0, limn→∞

(|P(X n − E(Xi )| > ε))→ 0.

Descriptors

∑Xi .

∀ε > 0, limn→∞

(|P(X n − E(Xi )| > ε))→ 0.

Descriptors

∑Xi .

∀ε > 0, limn→∞

(|P(X n − E(Xi )| > ε))→ 0.

Descriptors

∑Xi .

∀ε > 0, limn→∞

(|P(X n − E(Xi )| > ε))→ 0.

Descriptors

∑Xi .

∀ε > 0, limn→∞

(|P(X n − E(Xi )| > ε))→ 0.

Central Limit Theorem√n(X n − E(Xi )) converges in distribution to a Normal distribution,

i.e.,sample average is approximately Normal for large enough samples.

Descriptors

∑Xi .

∀ε > 0, limn→∞

(|P(X n − E(Xi )| > ε))→ 0.

Descriptors

Data as Point Clouds

Descriptors

Data as Point Clouds

big loop

Descriptors

Data as Persistence Diagrams

Confidence Sets

Confidence Sets for Persistence Diagrams:Analyzing Descriptors

Confidence Sets

Objective

To Find a Threshold

Given α ∈ (0, 1), we will find qα > 0 such that

P(W∞(D, Dn) ≤ qα) ≥ 1− α.

References

BTF, Lecci, Rinaldo, Wasserman, Balakrishnan, and Singh.Confidence sets for persistence diagrams. Annals of Stat., 2014.

Chazal, BTF, Lecci, Rinaldo, Singh, and Wasserman. On theBootstrap for Persistence Diagrams and Landscapes. Modeling andAnalysis of Information Systems, 2013.

Chazal, BTF, Lecci, Michel, Rinaldo, and Wasserman. RobustTopological Inference: Distance To a Measure and Kernel Distance,JMLR 18(159):1–40, 2018.

Confidence Sets

Objective

To Find a Threshold

Given α ∈ (0, 1), we will find qα > 0 such that

P(W∞(D, Dn) ≤ qα) ≥ 1− α.

References

BTF, Lecci, Rinaldo, Wasserman, Balakrishnan, and Singh.Confidence sets for persistence diagrams. Annals of Stat., 2014.

Chazal, BTF, Lecci, Rinaldo, Singh, and Wasserman. On theBootstrap for Persistence Diagrams and Landscapes. Modeling andAnalysis of Information Systems, 2013.

Chazal, BTF, Lecci, Michel, Rinaldo, and Wasserman. RobustTopological Inference: Distance To a Measure and Kernel Distance,JMLR 18(159):1–40, 2018.

Confidence Sets

Stat Slide: Bootstrapping

Old idiom: “pull yourself up by your bootstraps”

Want: a parameter of an unknown distribution F .

Try: estimate using empirical distribution F .

Nonparametric technique!

Confidence Sets

Bottleneck Bootstrap

We have a point cloud sample:Sn = {X1, . . . ,Xn}; Xi ∼ P.

Subsample (with replacement),obtaining: X = {X ∗1 , . . . ,X ∗b }

Compute Θ∗b(X ∗) = W∞(X ∗,Sn)using KDE or DTM.

Consider all possible outcomes:

{Θ∗b(X ∗)}X∗⊂Sn

Mimics:

{Θ(X ) = W∞(Sn,M)}Sn⊂M

Confidence Sets

Mimics:

Confidence Sets

Mimics:

Confidence Sets

Mimics:

Confidence Sets

Mimics:

Confidence Sets

Mimics:

Confidence Sets

Confidence Sets for Persistent Diagrams

Cα = {D ∈ DT : W∞(D, Dn) ≤ qα}

Confidence Sets

Confidence Sets for Persistent Diagrams

Cα = {D ∈ DT : W∞(D, Dn) ≤ qα}

Confidence Sets

Example

Noisy GridNoisy Grid KDE h=0.05●

●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●

0.0 0.5 1.0 1.5

KDE h=0.05●

●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●

0.0 0.5 1.0 1.5

DTM m=0.01●

●●●●●●●●●●●●

●● ●●●●●●●●●●

●●●●●●●●●●

●●●

0.05 0.10 0.15

dim 0dim 1

DTM m=0.01●

●●●●●●●●●●●●

●● ●●●●●●●●●●

●●●●●●●●●●

●●●

0.05 0.10 0.15

dim 0dim 1

Confidence Sets

Challenges

Techniques

Prove limit theorems.

Determine suitableassumptions on input.

Use the geometry of input(e.g., properties of anunderlying smoothmanifold).

Questions

These results are in thelimit. When is n big enough?

What confidence sets can weconstruct in the multi-dsetting?

What is the optimalthreshold for particularfiltrations?

Power analysis: are therejected points topologicallyinsignificant? (Type IIerrors)

Confidence Sets

Challenges

Techniques

Questions

Confidence Sets

Challenges

Techniques

Questions

Confidence Sets

Challenges

Techniques

Questions

Confidence Sets

Challenges

Techniques

Questions

Confidence Sets

Challenges

Techniques

Questions

Confidence Sets

Challenges

Techniques

Questions

Confidence Sets

Challenges

Techniques

Questions

Confidence Sets

Challenges

Techniques

Questions

Distances

Distance Measures:Comparing Descriptors

Distances

Distances Between Diagrams

Bottleneck d∞.

Interleaving distance.

Wasserstein dp.

Erosion distance.

Question

Can we define a centroid /Frechet mean?

arg minD

W 2∞(D,Di )

1. Turner, Mileyko, Mukherjee, and Harer. Frechet Meansfor Distributions of Persistence Diagrams. DCG, 2014.2. Munch, Tuner, Bendich, Mukherjee, Mattingly, andHarer. Probabilistic Frechet Means for Time VaryingPersistence Diagrams. Electronic Journal of Statistics,2015.

Distances

Bottleneck d∞.

Wasserstein dp.

Erosion distance.

Question

arg minD

W 2∞(D,Di )

1. Turner, Mileyko, Mukherjee, and Harer. Frechet Meansfor Distributions of Persistence Diagrams. DCG, 2014.2. Munch, Tuner, Bendich, Mukherjee, Mattingly, andHarer. Probabilistic Frechet Means for Time VaryingPersistence Diagrams. Electronic Journal of Statistics,2015.

Distances

Bottleneck d∞.

Wasserstein dp.

Erosion distance.

Question

arg minD

W 2∞(D,Di ) 1. Turner, Mileyko, Mukherjee, and Harer. Frechet Means

for Distributions of Persistence Diagrams. DCG, 2014.2. Munch, Tuner, Bendich, Mukherjee, Mattingly, andHarer. Probabilistic Frechet Means for Time VaryingPersistence Diagrams. Electronic Journal of Statistics,2015.

Distances

Bottleneck d∞.

Wasserstein dp.

Erosion distance.

Question

arg minD

W 2∞(D,Di ) 1. Turner, Mileyko, Mukherjee, and Harer. Frechet Means

for Distributions of Persistence Diagrams. DCG, 2014.2. Munch, Tuner, Bendich, Mukherjee, Mattingly, andHarer. Probabilistic Frechet Means for Time VaryingPersistence Diagrams. Electronic Journal of Statistics,2015.

Distances

Clustering

Distances

Clustering

... and Classification

Clustering (Unsupervised Learning)

Heirarchical: agglomerative or divisive.

k-means: NP-hard, so algorithms find a local minimum.

Distribution- and density-based clustering: e.g., DBSCAN.

Fuzzy clustering: membership is not binary.

Classification (Supervised Learning)

input data (training sample): D = {(Xi ,Yi )}ni=1

k-nn clustering: for new X , we predict Y by majority vote of the k nearestneighbors of the covariates (features) in D.

Distances

Clustering

Distances

Clustering

Distances

Clustering

Distances

Clustering ... and Classification

Distances

Homework!

Curate a list of topological descriptors. For each, we are looking for:

Name of descriptor.

List of distances that can be used between descriptors.

Short explanation (very short).

Reference to where first used, or a good use of it.

Pros: What is it good for?

Cons: Where / when is it insufficient?

https://github.com/compTAG/ima-multid

Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise...

Documents

Computational Statistics with Application to Bioinformaticsnumerical.recipes/CS395T/lectures2008/20-MultidimInterp.pdf · Computational Statistics with Application to Bioinformatics

Introduction to Computational Statistics

Computational Statistics and Mathematics for Cyber Securitystatisticalcyber.com/talks/Marchette, David.pdf · Computational Statistics Machine Learning Manifold Learning Topological

A computational introduction to quantum statistics using

Computational Statistics Optimisation...COMPUTATIONAL STATISTICS OPTIMISATION Luca Bortolussi Department of Mathematics and Geosciences University of Trieste Ofﬁce 238, third ﬂoor,

Matlab - Computational Statistics Handbook With Matlab

Modern Computational Statistics lec03 Notes

Research Methods for Computational Statistics

Enhancing Sampling in Computational Statistics [0.2cm

Statistics and Probability Primer for Computational Biologists

17SH1101 - PROBABILITY, STATISTICS AND COMPUTATIONAL ... syllabus mtech.pdf · 17SH1101 - PROBABILITY, STATISTICS AND COMPUTATIONAL TECHNIQUES Instruction/week: 4 hrs. Max. Sessional

Computational Linguistics and Statistics in the Analysis

17SH1101 - PROBABILITY, STATISTICS AND COMPUTATIONAL

Statistical Computing / Computational Statistics

Learning Multi-modal Similarity - UCSD Computational Statistics

Optimization Methods for Computational Statistics … Methods for Computational Statistics and Data Analysis Stephen Wright University of Wisconsin-Madison SAMSI Optimization …

Computational Statistics, 2nd Edition - USTChome.ustc.edu.cn/...computational_statistics... · Computational Statistics, 2nd Edition Chapter2: Optimation and Solving Nonlinear Equations

(eBook) Computational Statistics Handbook With Matlab

Computational Statistics with Application to …numerical.recipes/CS395T/lectures2008/17-ROCPrecision...Computational Statistics with Application to Bioinformatics Prof. William H

Computational Statistics, 2nd Edition - USTChome.ustc.edu.cn/~liweiyu/documents/chapter4_EM_20180328.pdf · Computational Statistics, 2nd Edition Chapter4: EM Optimization Methods