83
Statistics for a Computational Topologist Part I Brittany Terese Fasy TA: Samuel A. Micka School of Computing and Dept. of Mathematical Sciences Montana State University August 14, 2018 B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 1 / 25

Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Statistics for a Computational TopologistPart I

Brittany Terese FasyTA: Samuel A. Micka

School of Computing and Dept. of Mathematical SciencesMontana State University

August 14, 2018

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 1 / 25

Page 2: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Why Topological Data Analysis?“Data has shape and the shape matters.” - Gunnar Carlsson

Today, Data is high-dimensional,

HUGE, present everywhere

———————Nicolaua, Levine, andCarlsson, PNAS 2011

———————http://astrobites.com/ ———————

www.mapconstruction.org

... and needs to be summarized, analyzed, and compared!

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 2 / 25

Page 3: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Why Topological Data Analysis?“Data has shape and the shape matters.” - Gunnar Carlsson

Today, Data is high-dimensional, HUGE,

present everywhere

———————Nicolaua, Levine, andCarlsson, PNAS 2011

———————http://astrobites.com/

———————www.mapconstruction.org

... and needs to be summarized, analyzed, and compared!

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 2 / 25

Page 4: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Why Topological Data Analysis?“Data has shape and the shape matters.” - Gunnar Carlsson

Today, Data is high-dimensional, HUGE, present everywhere

———————Nicolaua, Levine, andCarlsson, PNAS 2011

———————http://astrobites.com/ ———————

www.mapconstruction.org

... and needs to be summarized, analyzed, and compared!

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 2 / 25

Page 5: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

What questions do we ask in data anaylsis?

Think! Write down one question (2 min)

Pair! Share with partner, and add more questions to your list (5 min)

Share! Raise hands please! (5 min)

More ideas? [email protected]

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 3 / 25

Page 6: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

What questions do we ask in data anaylsis?

Think! Write down one question (2 min)

Pair! Share with partner, and add more questions to your list (5 min)

Share! Raise hands please! (5 min)

More ideas? [email protected]

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 3 / 25

Page 7: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

What questions do we ask in data anaylsis?

Think! Write down one question (2 min)

Pair! Share with partner, and add more questions to your list (5 min)

Share! Raise hands please! (5 min)

More ideas? [email protected]

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 3 / 25

Page 8: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

What questions do we ask in data anaylsis?

Think! Write down one question (2 min)

Pair! Share with partner, and add more questions to your list (5 min)

Share! Raise hands please! (5 min)

More ideas? [email protected]

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 3 / 25

Page 9: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

What questions do we ask in data anaylsis?

Think! Write down one question (2 min)

Pair! Share with partner, and add more questions to your list (5 min)

Share! Raise hands please! (5 min)

More ideas? [email protected]

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 3 / 25

Page 10: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Data Analysis Questions

Summarize and Analyze

What is this shape?

How many components / populations?

Can we categorize? (Classification)

What are the parameters? (Inference: Point Estimation)

How far do parameters likely lie from estimates? (Confidence Sets)

Compare

Are these the same? In distribution?

Has something changed? If so, what has changed?

Which is bigger?

Can we retain the null hypothesis? (Inference: Hypothesis Testing)

What is the relationship between X and Y ? (Regression)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 4 / 25

Page 11: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Data Analysis Questions

Summarize and Analyze

What is this shape? How many components / populations?

Can we categorize? (Classification)

What are the parameters? (Inference: Point Estimation)

How far do parameters likely lie from estimates? (Confidence Sets)

Compare

Are these the same? In distribution?

Has something changed? If so, what has changed?

Which is bigger?

Can we retain the null hypothesis? (Inference: Hypothesis Testing)

What is the relationship between X and Y ? (Regression)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 4 / 25

Page 12: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Data Analysis Questions

Summarize and Analyze

What is this shape? How many components / populations?

Can we categorize? (Classification)

What are the parameters? (Inference: Point Estimation)

How far do parameters likely lie from estimates? (Confidence Sets)

Compare

Are these the same? In distribution?

Has something changed? If so, what has changed?

Which is bigger?

Can we retain the null hypothesis? (Inference: Hypothesis Testing)

What is the relationship between X and Y ? (Regression)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 4 / 25

Page 13: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Data Analysis Questions

Summarize and Analyze

What is this shape? How many components / populations?

Can we categorize? (Classification)

What are the parameters? (Inference: Point Estimation)

How far do parameters likely lie from estimates? (Confidence Sets)

Compare

Are these the same? In distribution?

Has something changed? If so, what has changed?

Which is bigger?

Can we retain the null hypothesis? (Inference: Hypothesis Testing)

What is the relationship between X and Y ? (Regression)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 4 / 25

Page 14: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Data Analysis Questions

Summarize and Analyze

What is this shape? How many components / populations?

Can we categorize? (Classification)

What are the parameters? (Inference: Point Estimation)

How far do parameters likely lie from estimates? (Confidence Sets)

Compare

Are these the same? In distribution?

Has something changed? If so, what has changed?

Which is bigger?

Can we retain the null hypothesis? (Inference: Hypothesis Testing)

What is the relationship between X and Y ? (Regression)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 4 / 25

Page 15: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Data Analysis Questions

Summarize and Analyze

What is this shape? How many components / populations?

Can we categorize? (Classification)

What are the parameters? (Inference: Point Estimation)

How far do parameters likely lie from estimates? (Confidence Sets)

Compare

Are these the same?

In distribution?

Has something changed? If so, what has changed?

Which is bigger?

Can we retain the null hypothesis? (Inference: Hypothesis Testing)

What is the relationship between X and Y ? (Regression)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 4 / 25

Page 16: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Data Analysis Questions

Summarize and Analyze

What is this shape? How many components / populations?

Can we categorize? (Classification)

What are the parameters? (Inference: Point Estimation)

How far do parameters likely lie from estimates? (Confidence Sets)

Compare

Are these the same? In distribution?

Has something changed? If so, what has changed?

Which is bigger?

Can we retain the null hypothesis? (Inference: Hypothesis Testing)

What is the relationship between X and Y ? (Regression)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 4 / 25

Page 17: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Data Analysis Questions

Summarize and Analyze

What is this shape? How many components / populations?

Can we categorize? (Classification)

What are the parameters? (Inference: Point Estimation)

How far do parameters likely lie from estimates? (Confidence Sets)

Compare

Are these the same? In distribution?

Has something changed?

If so, what has changed?

Which is bigger?

Can we retain the null hypothesis? (Inference: Hypothesis Testing)

What is the relationship between X and Y ? (Regression)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 4 / 25

Page 18: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Data Analysis Questions

Summarize and Analyze

What is this shape? How many components / populations?

Can we categorize? (Classification)

What are the parameters? (Inference: Point Estimation)

How far do parameters likely lie from estimates? (Confidence Sets)

Compare

Are these the same? In distribution?

Has something changed? If so, what has changed?

Which is bigger?

Can we retain the null hypothesis? (Inference: Hypothesis Testing)

What is the relationship between X and Y ? (Regression)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 4 / 25

Page 19: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Data Analysis Questions

Summarize and Analyze

What is this shape? How many components / populations?

Can we categorize? (Classification)

What are the parameters? (Inference: Point Estimation)

How far do parameters likely lie from estimates? (Confidence Sets)

Compare

Are these the same? In distribution?

Has something changed? If so, what has changed?

Which is bigger?

Can we retain the null hypothesis? (Inference: Hypothesis Testing)

What is the relationship between X and Y ? (Regression)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 4 / 25

Page 20: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Data Analysis Questions

Summarize and Analyze

What is this shape? How many components / populations?

Can we categorize? (Classification)

What are the parameters? (Inference: Point Estimation)

How far do parameters likely lie from estimates? (Confidence Sets)

Compare

Are these the same? In distribution?

Has something changed? If so, what has changed?

Which is bigger?

Can we retain the null hypothesis? (Inference: Hypothesis Testing)

What is the relationship between X and Y ? (Regression)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 4 / 25

Page 21: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Data Analysis Questions

Summarize and Analyze

What is this shape? How many components / populations?

Can we categorize? (Classification)

What are the parameters? (Inference: Point Estimation)

How far do parameters likely lie from estimates? (Confidence Sets)

Compare

Are these the same? In distribution?

Has something changed? If so, what has changed?

Which is bigger?

Can we retain the null hypothesis? (Inference: Hypothesis Testing)

What is the relationship between X and Y ? (Regression)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 4 / 25

Page 22: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Most Important Questions

1. Which descriptor best captures our data?

Descriptors

Confidence Sets

2. How do we measure distance between descriptors?

Distances

Clustering

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 5 / 25

Page 23: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Most Important Questions

1. Which descriptor best captures our data?

Descriptors

Confidence Sets

2. How do we measure distance between descriptors?

Distances

Clustering

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 5 / 25

Page 24: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Most Important Questions

1. Which descriptor best captures our data?

Descriptors

Confidence Sets

2. How do we measure distance between descriptors?

Distances

Clustering

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 5 / 25

Page 25: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Most Important Questions

1. Which descriptor best captures our data?

Descriptors

Confidence Sets

2. How do we measure distance between descriptors?

Distances

Clustering

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 5 / 25

Page 26: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Most Important Questions

1. Which descriptor best captures our data?

Descriptors

Confidence Sets

2. How do we measure distance between descriptors?

Distances

Clustering

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 5 / 25

Page 27: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Descriptors

Topological Descriptors

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 6 / 25

Page 28: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Descriptors

Stat Reverences

Wasserman. All of Statistics: a Concise Course in Statistical Inference.Springer, 2010.

Givens and Hoeting. Computational Statistics. Wiley, 2013.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 7 / 25

Page 29: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Descriptors

Stat Slide: The Basics

Let F be a probability distribution with density f .

X ∼ F reads “X has distribution F”.

Here, X is called a random variable.

Expectation: E(X ) =∫x dF (x).

Quantile Function CDF−1(q).

−4 −2 0 2 4

0.00.1

0.20.3

0.4

−4 −2 0 2 4

0.00.2

0.40.6

0.81.0

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 8 / 25

Page 30: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Descriptors

Stat Slide: The Basics

Let F be a probability distribution with density f .

X ∼ F reads “X has distribution F”.

Here, X is called a random variable.

Expectation: E(X ) =∫x dF (x).

Quantile Function CDF−1(q).

−4 −2 0 2 4

0.00.1

0.20.3

0.4

−4 −2 0 2 4

0.00.2

0.40.6

0.81.0

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 8 / 25

Page 31: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Descriptors

Stat Slide: The Basics

Let F be a probability distribution with density f .

X ∼ F reads “X has distribution F”.

Here, X is called a random variable.

Expectation: E(X ) =∫x dF (x).

Quantile Function CDF−1(q).

−4 −2 0 2 4

0.00.1

0.20.3

0.4

−4 −2 0 2 4

0.00.2

0.40.6

0.81.0

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 8 / 25

Page 32: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Descriptors

Stat Slide: The Basics

Let F be a probability distribution with density f .

X ∼ F reads “X has distribution F”.

Here, X is called a random variable.

Expectation: E(X ) =∫x dF (x).

Quantile Function CDF−1(q).

−4 −2 0 2 4

0.00.1

0.20.3

0.4

−4 −2 0 2 4

0.00.2

0.40.6

0.81.0

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 8 / 25

Page 33: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Descriptors

Prob/Stat Slide: Descriptors and Limit Theory

Let F be some distribution.

Let X1,X2, . . . ,Xn ∼ F . (The data).

A statistic or descriptor is a function of the data:T (X1,X2, . . . ,Xn) or T (X n).

Sample average: X n = 1n

∑Xi .

Law of Large Numbers

X n converges to E(Xi ) in probability:

∀ε > 0, limn→∞

(|P(X n − E(Xi )| > ε))→ 0.

Central Limit Theorem√n(X n − E(Xi )) converges in distribution to a Normal distribution, i.e.,

sample average is approximately Normal for large enough samples.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 9 / 25

Page 34: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Descriptors

Prob/Stat Slide: Descriptors and Limit Theory

Let F be some distribution.

Let X1,X2, . . . ,Xn ∼ F . (The data).

A statistic or descriptor is a function of the data:T (X1,X2, . . . ,Xn) or T (X n).

Sample average: X n = 1n

∑Xi .

Law of Large Numbers

X n converges to E(Xi ) in probability:

∀ε > 0, limn→∞

(|P(X n − E(Xi )| > ε))→ 0.

Central Limit Theorem√n(X n − E(Xi )) converges in distribution to a Normal distribution, i.e.,

sample average is approximately Normal for large enough samples.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 9 / 25

Page 35: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Descriptors

Prob/Stat Slide: Descriptors and Limit Theory

Let F be some distribution.

Let X1,X2, . . . ,Xn ∼ F . (The data).

A statistic or descriptor is a function of the data:T (X1,X2, . . . ,Xn) or T (X n).

Sample average: X n = 1n

∑Xi .

Law of Large Numbers

X n converges to E(Xi ) in probability:

∀ε > 0, limn→∞

(|P(X n − E(Xi )| > ε))→ 0.

Central Limit Theorem√n(X n − E(Xi )) converges in distribution to a Normal distribution, i.e.,

sample average is approximately Normal for large enough samples.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 9 / 25

Page 36: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Descriptors

Prob/Stat Slide: Descriptors and Limit Theory

Let F be some distribution.

Let X1,X2, . . . ,Xn ∼ F . (The data).

A statistic or descriptor is a function of the data:T (X1,X2, . . . ,Xn) or T (X n).

Sample average: X n = 1n

∑Xi .

Law of Large Numbers

X n converges to E(Xi ) in probability:

∀ε > 0, limn→∞

(|P(X n − E(Xi )| > ε))→ 0.

Central Limit Theorem√n(X n − E(Xi )) converges in distribution to a Normal distribution, i.e.,

sample average is approximately Normal for large enough samples.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 9 / 25

Page 37: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Descriptors

Prob/Stat Slide: Descriptors and Limit Theory

Let F be some distribution.

Let X1,X2, . . . ,Xn ∼ F . (The data).

A statistic or descriptor is a function of the data:T (X1,X2, . . . ,Xn) or T (X n).

Sample average: X n = 1n

∑Xi .

Law of Large Numbers

X n converges to E(Xi ) in probability:

∀ε > 0, limn→∞

(|P(X n − E(Xi )| > ε))→ 0.

Central Limit Theorem√n(X n − E(Xi )) converges in distribution to a Normal distribution, i.e.,

sample average is approximately Normal for large enough samples.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 9 / 25

Page 38: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Descriptors

Prob/Stat Slide: Descriptors and Limit Theory

Let F be some distribution.

Let X1,X2, . . . ,Xn ∼ F . (The data).

A statistic or descriptor is a function of the data:T (X1,X2, . . . ,Xn) or T (X n).

Sample average: X n = 1n

∑Xi .

Law of Large Numbers

X n converges to E(Xi ) in probability:

∀ε > 0, limn→∞

(|P(X n − E(Xi )| > ε))→ 0.

Central Limit Theorem√n(X n − E(Xi )) converges in distribution to a Normal distribution,

i.e.,sample average is approximately Normal for large enough samples.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 9 / 25

Page 39: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Descriptors

Prob/Stat Slide: Descriptors and Limit Theory

Let F be some distribution.

Let X1,X2, . . . ,Xn ∼ F . (The data).

A statistic or descriptor is a function of the data:T (X1,X2, . . . ,Xn) or T (X n).

Sample average: X n = 1n

∑Xi .

Law of Large Numbers

X n converges to E(Xi ) in probability:

∀ε > 0, limn→∞

(|P(X n − E(Xi )| > ε))→ 0.

Central Limit Theorem√n(X n − E(Xi )) converges in distribution to a Normal distribution, i.e.,

sample average is approximately Normal for large enough samples.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 9 / 25

Page 40: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Descriptors

Data as Point Clouds

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 10 / 25

Page 41: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Descriptors

Data as Point Clouds

big loop

noise

pinch

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 10 / 25

Page 42: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Descriptors

Data as Persistence Diagrams

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 11 / 25

Page 43: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Confidence Sets for Persistence Diagrams:Analyzing Descriptors

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 12 / 25

Page 44: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Objective

To Find a Threshold

Given α ∈ (0, 1), we will find qα > 0 such that

P(W∞(D, Dn) ≤ qα) ≥ 1− α.

References

BTF, Lecci, Rinaldo, Wasserman, Balakrishnan, and Singh.Confidence sets for persistence diagrams. Annals of Stat., 2014.

Chazal, BTF, Lecci, Rinaldo, Singh, and Wasserman. On theBootstrap for Persistence Diagrams and Landscapes. Modeling andAnalysis of Information Systems, 2013.

Chazal, BTF, Lecci, Michel, Rinaldo, and Wasserman. RobustTopological Inference: Distance To a Measure and Kernel Distance,JMLR 18(159):1–40, 2018.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 13 / 25

Page 45: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Objective

To Find a Threshold

Given α ∈ (0, 1), we will find qα > 0 such that

P(W∞(D, Dn) ≤ qα) ≥ 1− α.

References

BTF, Lecci, Rinaldo, Wasserman, Balakrishnan, and Singh.Confidence sets for persistence diagrams. Annals of Stat., 2014.

Chazal, BTF, Lecci, Rinaldo, Singh, and Wasserman. On theBootstrap for Persistence Diagrams and Landscapes. Modeling andAnalysis of Information Systems, 2013.

Chazal, BTF, Lecci, Michel, Rinaldo, and Wasserman. RobustTopological Inference: Distance To a Measure and Kernel Distance,JMLR 18(159):1–40, 2018.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 13 / 25

Page 46: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Stat Slide: Bootstrapping

Old idiom: “pull yourself up by your bootstraps”

Want: a parameter of an unknown distribution F .

Try: estimate using empirical distribution F .

Nonparametric technique!

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 14 / 25

Page 47: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Stat Slide: Bootstrapping

Old idiom: “pull yourself up by your bootstraps”

Want: a parameter of an unknown distribution F .

Try: estimate using empirical distribution F .

Nonparametric technique!

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 14 / 25

Page 48: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Stat Slide: Bootstrapping

Old idiom: “pull yourself up by your bootstraps”

Want: a parameter of an unknown distribution F .

Try: estimate using empirical distribution F .

Nonparametric technique!

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 14 / 25

Page 49: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Stat Slide: Bootstrapping

Old idiom: “pull yourself up by your bootstraps”

Want: a parameter of an unknown distribution F .

Try: estimate using empirical distribution F .

Nonparametric technique!

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 14 / 25

Page 50: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Bottleneck Bootstrap

We have a point cloud sample:Sn = {X1, . . . ,Xn}; Xi ∼ P.

Subsample (with replacement),obtaining: X = {X ∗1 , . . . ,X ∗b }

Compute Θ∗b(X ∗) = W∞(X ∗,Sn)using KDE or DTM.

Consider all possible outcomes:

{Θ∗b(X ∗)}X∗⊂Sn

Mimics:

{Θ(X ) = W∞(Sn,M)}Sn⊂M

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 15 / 25

Page 51: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Bottleneck Bootstrap

We have a point cloud sample:Sn = {X1, . . . ,Xn}; Xi ∼ P.

Subsample (with replacement),obtaining: X = {X ∗1 , . . . ,X ∗b }

Compute Θ∗b(X ∗) = W∞(X ∗,Sn)using KDE or DTM.

Consider all possible outcomes:

{Θ∗b(X ∗)}X∗⊂Sn

Mimics:

{Θ(X ) = W∞(Sn,M)}Sn⊂M

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 15 / 25

Page 52: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Bottleneck Bootstrap

We have a point cloud sample:Sn = {X1, . . . ,Xn}; Xi ∼ P.

Subsample (with replacement),obtaining: X = {X ∗1 , . . . ,X ∗b }

Compute Θ∗b(X ∗) = W∞(X ∗,Sn)using KDE or DTM.

Consider all possible outcomes:

{Θ∗b(X ∗)}X∗⊂Sn

Mimics:

{Θ(X ) = W∞(Sn,M)}Sn⊂M

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 15 / 25

Page 53: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Bottleneck Bootstrap

We have a point cloud sample:Sn = {X1, . . . ,Xn}; Xi ∼ P.

Subsample (with replacement),obtaining: X = {X ∗1 , . . . ,X ∗b }

Compute Θ∗b(X ∗) = W∞(X ∗,Sn)using KDE or DTM.

Consider all possible outcomes:

{Θ∗b(X ∗)}X∗⊂Sn

Mimics:

{Θ(X ) = W∞(Sn,M)}Sn⊂M

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 15 / 25

Page 54: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Bottleneck Bootstrap

We have a point cloud sample:Sn = {X1, . . . ,Xn}; Xi ∼ P.

Subsample (with replacement),obtaining: X = {X ∗1 , . . . ,X ∗b }

Compute Θ∗b(X ∗) = W∞(X ∗,Sn)using KDE or DTM.

Consider all possible outcomes:

{Θ∗b(X ∗)}X∗⊂Sn

Mimics:

{Θ(X ) = W∞(Sn,M)}Sn⊂M

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 15 / 25

Page 55: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Bottleneck Bootstrap

We have a point cloud sample:Sn = {X1, . . . ,Xn}; Xi ∼ P.

Subsample (with replacement),obtaining: X = {X ∗1 , . . . ,X ∗b }

Compute Θ∗b(X ∗) = W∞(X ∗,Sn)using KDE or DTM.

Consider all possible outcomes:

{Θ∗b(X ∗)}X∗⊂Sn

Mimics:

{Θ(X ) = W∞(Sn,M)}Sn⊂M

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 15 / 25

Page 56: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Confidence Sets for Persistent Diagrams

Cα = {D ∈ DT : W∞(D, Dn) ≤ qα}

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 16 / 25

Page 57: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Confidence Sets for Persistent Diagrams

Cα = {D ∈ DT : W∞(D, Dn) ≤ qα}

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 16 / 25

Page 58: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Example

Birth

Dea

th

Noisy GridNoisy Grid KDE h=0.05●

●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●

0.0 0.5 1.0 1.5

0.0

0.5

1.0

1.5

Death

Birth

KDE h=0.05●

●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●

0.0 0.5 1.0 1.5

0.0

0.5

1.0

1.5

Death

Birth

DTM m=0.01●

●●●●●●●●●●●●

●● ●●●●●●●●●●

●●●●●●●●●●

●●●

0.05 0.10 0.15

0.05

0.10

0.15

dim 0dim 1

Birth

Dea

th

DTM m=0.01●

●●●●●●●●●●●●

●● ●●●●●●●●●●

●●●●●●●●●●

●●●

0.05 0.10 0.15

0.05

0.10

0.15

dim 0dim 1

Birth

Dea

th

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 17 / 25

Page 59: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Challenges

Techniques

Prove limit theorems.

Determine suitableassumptions on input.

Use the geometry of input(e.g., properties of anunderlying smoothmanifold).

Questions

These results are in thelimit. When is n big enough?

What confidence sets can weconstruct in the multi-dsetting?

What is the optimalthreshold for particularfiltrations?

Power analysis: are therejected points topologicallyinsignificant? (Type IIerrors)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 18 / 25

Page 60: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Challenges

Techniques

Prove limit theorems.

Determine suitableassumptions on input.

Use the geometry of input(e.g., properties of anunderlying smoothmanifold).

Questions

These results are in thelimit. When is n big enough?

What confidence sets can weconstruct in the multi-dsetting?

What is the optimalthreshold for particularfiltrations?

Power analysis: are therejected points topologicallyinsignificant? (Type IIerrors)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 18 / 25

Page 61: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Challenges

Techniques

Prove limit theorems.

Determine suitableassumptions on input.

Use the geometry of input(e.g., properties of anunderlying smoothmanifold).

Questions

These results are in thelimit. When is n big enough?

What confidence sets can weconstruct in the multi-dsetting?

What is the optimalthreshold for particularfiltrations?

Power analysis: are therejected points topologicallyinsignificant? (Type IIerrors)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 18 / 25

Page 62: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Challenges

Techniques

Prove limit theorems.

Determine suitableassumptions on input.

Use the geometry of input(e.g., properties of anunderlying smoothmanifold).

Questions

These results are in thelimit. When is n big enough?

What confidence sets can weconstruct in the multi-dsetting?

What is the optimalthreshold for particularfiltrations?

Power analysis: are therejected points topologicallyinsignificant? (Type IIerrors)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 18 / 25

Page 63: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Challenges

Techniques

Prove limit theorems.

Determine suitableassumptions on input.

Use the geometry of input(e.g., properties of anunderlying smoothmanifold).

Questions

These results are in thelimit. When is n big enough?

What confidence sets can weconstruct in the multi-dsetting?

What is the optimalthreshold for particularfiltrations?

Power analysis: are therejected points topologicallyinsignificant? (Type IIerrors)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 18 / 25

Page 64: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Challenges

Techniques

Prove limit theorems.

Determine suitableassumptions on input.

Use the geometry of input(e.g., properties of anunderlying smoothmanifold).

Questions

These results are in thelimit. When is n big enough?

What confidence sets can weconstruct in the multi-dsetting?

What is the optimalthreshold for particularfiltrations?

Power analysis: are therejected points topologicallyinsignificant? (Type IIerrors)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 18 / 25

Page 65: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Challenges

Techniques

Prove limit theorems.

Determine suitableassumptions on input.

Use the geometry of input(e.g., properties of anunderlying smoothmanifold).

Questions

These results are in thelimit. When is n big enough?

What confidence sets can weconstruct in the multi-dsetting?

What is the optimalthreshold for particularfiltrations?

Power analysis: are therejected points topologicallyinsignificant? (Type IIerrors)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 18 / 25

Page 66: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Challenges

Techniques

Prove limit theorems.

Determine suitableassumptions on input.

Use the geometry of input(e.g., properties of anunderlying smoothmanifold).

Questions

These results are in thelimit. When is n big enough?

What confidence sets can weconstruct in the multi-dsetting?

What is the optimalthreshold for particularfiltrations?

Power analysis: are therejected points topologicallyinsignificant? (Type IIerrors)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 18 / 25

Page 67: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Confidence Sets

Challenges

Techniques

Prove limit theorems.

Determine suitableassumptions on input.

Use the geometry of input(e.g., properties of anunderlying smoothmanifold).

Questions

These results are in thelimit. When is n big enough?

What confidence sets can weconstruct in the multi-dsetting?

What is the optimalthreshold for particularfiltrations?

Power analysis: are therejected points topologicallyinsignificant? (Type IIerrors)

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 18 / 25

Page 68: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Distances

Distance Measures:Comparing Descriptors

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 19 / 25

Page 69: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Distances

?=

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 20 / 25

Page 70: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Distances

?=

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 20 / 25

Page 71: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Distances

Distances Between Diagrams

Bottleneck d∞.

Interleaving distance.

Wasserstein dp.

Erosion distance.

Question

Can we define a centroid /Frechet mean?

arg minD

∑i

W 2∞(D,Di )

1. Turner, Mileyko, Mukherjee, and Harer. Frechet Meansfor Distributions of Persistence Diagrams. DCG, 2014.2. Munch, Tuner, Bendich, Mukherjee, Mattingly, andHarer. Probabilistic Frechet Means for Time VaryingPersistence Diagrams. Electronic Journal of Statistics,2015.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 21 / 25

Page 72: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Distances

Distances Between Diagrams

Bottleneck d∞.

Interleaving distance.

Wasserstein dp.

Erosion distance.

Question

Can we define a centroid /Frechet mean?

arg minD

∑i

W 2∞(D,Di )

1. Turner, Mileyko, Mukherjee, and Harer. Frechet Meansfor Distributions of Persistence Diagrams. DCG, 2014.2. Munch, Tuner, Bendich, Mukherjee, Mattingly, andHarer. Probabilistic Frechet Means for Time VaryingPersistence Diagrams. Electronic Journal of Statistics,2015.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 21 / 25

Page 73: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Distances

Distances Between Diagrams

Bottleneck d∞.

Interleaving distance.

Wasserstein dp.

Erosion distance.

Question

Can we define a centroid /Frechet mean?

arg minD

∑i

W 2∞(D,Di ) 1. Turner, Mileyko, Mukherjee, and Harer. Frechet Means

for Distributions of Persistence Diagrams. DCG, 2014.2. Munch, Tuner, Bendich, Mukherjee, Mattingly, andHarer. Probabilistic Frechet Means for Time VaryingPersistence Diagrams. Electronic Journal of Statistics,2015.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 21 / 25

Page 74: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Distances

Distances Between Diagrams

Bottleneck d∞.

Interleaving distance.

Wasserstein dp.

Erosion distance.

Question

Can we define a centroid /Frechet mean?

arg minD

∑i

W 2∞(D,Di ) 1. Turner, Mileyko, Mukherjee, and Harer. Frechet Means

for Distributions of Persistence Diagrams. DCG, 2014.2. Munch, Tuner, Bendich, Mukherjee, Mattingly, andHarer. Probabilistic Frechet Means for Time VaryingPersistence Diagrams. Electronic Journal of Statistics,2015.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 21 / 25

Page 75: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Distances

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 22 / 25

Page 76: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Distances

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 22 / 25

Page 77: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Distances

Clustering

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 23 / 25

Page 78: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Distances

Clustering

... and Classification

Clustering (Unsupervised Learning)

Heirarchical: agglomerative or divisive.

k-means: NP-hard, so algorithms find a local minimum.

Distribution- and density-based clustering: e.g., DBSCAN.

Fuzzy clustering: membership is not binary.

Classification (Supervised Learning)

input data (training sample): D = {(Xi ,Yi )}ni=1

k-nn clustering: for new X , we predict Y by majority vote of the k nearestneighbors of the covariates (features) in D.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 24 / 25

Page 79: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Distances

Clustering

... and Classification

Clustering (Unsupervised Learning)

Heirarchical: agglomerative or divisive.

k-means: NP-hard, so algorithms find a local minimum.

Distribution- and density-based clustering: e.g., DBSCAN.

Fuzzy clustering: membership is not binary.

Classification (Supervised Learning)

input data (training sample): D = {(Xi ,Yi )}ni=1

k-nn clustering: for new X , we predict Y by majority vote of the k nearestneighbors of the covariates (features) in D.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 24 / 25

Page 80: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Distances

Clustering

... and Classification

Clustering (Unsupervised Learning)

Heirarchical: agglomerative or divisive.

k-means: NP-hard, so algorithms find a local minimum.

Distribution- and density-based clustering: e.g., DBSCAN.

Fuzzy clustering: membership is not binary.

Classification (Supervised Learning)

input data (training sample): D = {(Xi ,Yi )}ni=1

k-nn clustering: for new X , we predict Y by majority vote of the k nearestneighbors of the covariates (features) in D.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 24 / 25

Page 81: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Distances

Clustering

... and Classification

Clustering (Unsupervised Learning)

Heirarchical: agglomerative or divisive.

k-means: NP-hard, so algorithms find a local minimum.

Distribution- and density-based clustering: e.g., DBSCAN.

Fuzzy clustering: membership is not binary.

Classification (Supervised Learning)

input data (training sample): D = {(Xi ,Yi )}ni=1

k-nn clustering: for new X , we predict Y by majority vote of the k nearestneighbors of the covariates (features) in D.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 24 / 25

Page 82: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Distances

Clustering ... and Classification

Clustering (Unsupervised Learning)

Heirarchical: agglomerative or divisive.

k-means: NP-hard, so algorithms find a local minimum.

Distribution- and density-based clustering: e.g., DBSCAN.

Fuzzy clustering: membership is not binary.

Classification (Supervised Learning)

input data (training sample): D = {(Xi ,Yi )}ni=1

k-nn clustering: for new X , we predict Y by majority vote of the k nearestneighbors of the covariates (features) in D.

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 24 / 25

Page 83: Statistics for a Computational Topologist - Part I · Wasserman. All of Statistics: a Concise Course in Statistical Inference. Springer, 2010. Givens and Hoeting. Computational Statistics

Distances

Homework!

Curate a list of topological descriptors. For each, we are looking for:

Name of descriptor.

List of distances that can be used between descriptors.

Short explanation (very short).

Reference to where first used, or a good use of it.

Pros: What is it good for?

Cons: Where / when is it insufficient?

https://github.com/compTAG/ima-multid

B. Fasy (MSU) Statistics for a Computational Topologist August 14, 2018 25 / 25