Using Topological Data Analysis on your BigData

Shape and MeaningAn Introduction to Topological Data Analysis

Anthony Bak

For this talk I want to:

I Show you how TDA provides a framework for many machine learning/dataanalysis techniques

I Demonstrate how Ayasdi provides insights into the data.

Caveats: I am only talking about the strain of TDA done by Ayasdi

The Data Problem

How do we extract meaning from Complex Data?

I Data is complex because it’s "Big Data"I Or has very rich features (eg. Genetic Data >500,000 features, complicated

interdependencies)I Or both!

The Problem in both cases is that there isn’t a single story happening in your data.

TDA will be the tool that summarizes out the irrelevant stories to get at somethinginteresting.

The Data Problem

I Data is complex because it’s "Big Data"

I Or has very rich features (eg. Genetic Data >500,000 features, complicatedinterdependencies)

I Or both!

The Data Problem

interdependencies)

I Or both!

The Data Problem

Data Has Shape

And Shape Has Meaning

⇒ In this talk I will focus on how we extract meaning.

Data Has Shape

And Shape Has Meaning

⇒ In this talk I will focus on how we extract meaning.

But first... What is shape?

I Shape is the global realization of local constraints.I For a given problem determined by a choice of

I Features, columns or properties to measure.I A metric on the columns.

I But not necessarily so. There are more relaxed definitions of shape and wecan use those too.

The goal of TDA is to understand (for us, summarize) the shape with nopreconceived model of what it should be.

I Shape is the global realization of local constraints.

I For a given problem determined by a choice of

Math World

To show you how we extract insight from shape we start in "Math World"

I We’ll draw the data as a smooth manifold.I Functions that appear are smooth or continuous.

⇒ We will not need either of these assumptions once we’re in "Data World".

⇒ Even more importantly, data in the real world is never like this

Math World

I We’ll draw the data as a smooth manifold.

I Functions that appear are smooth or continuous.

Math World

pf−1(p)

Math World

pf−1(p)

Math World

pf−1(p)

Math World

pf−1(p)

Math World

pf−1(p)

Math World

pf−1(p)

Math World

pf−1(p)

Math World

pf−1(p)

Math World

pf−1(p)

p′ q′

Math World

pf−1(p)

p′ q′

Math World

pf−1(p)

Math World

pf−1(p)

Math World

pf−1(p)

Exercise:

What is the summary if we use both lenses, g and f at the same time?

(g, f )

=⇒ We recover the original space

Exercise:

(g, f )

Exercise:

(g, f )

Exercise:

(g, f )

Exercise:

(g, f )

Exercise:

(g, f )

Exercise:

(g, f )

What did the exercise tell us?

I With a rich enough set of functions (lenses) we can recover the original spaceI Of course this leaves us no better off then where we started.

⇒ Instead we select a set of functions to tune in to the signal we want.

I With a rich enough set of functions (lenses) we can recover the original space

I Of course this leaves us no better off then where we started.

This is what Ayasdi does:

pf−1(p)

Modulo some details....

This is what Ayasdi does:

pf−1(p)

Modulo some details....

Why is this useful?

⇒ We get "easy" understanding of the localizations of quantities of interest.

Why is this useful?

⇒ We get "easy" understanding of the localizations of quantities of interest.

Why is this useful?

I Lenses inform us where in the space to look for phenomena.I For easy localizations many different lenses will be informative.I For hard ( = geometrically distributed) localizations we have to be more

careful.

But even then, we frequently get incremental knowledge even from apoorly chosen lens.

Why is this useful?

I Lenses inform us where in the space to look for phenomena.

I For easy localizations many different lenses will be informative.I For hard ( = geometrically distributed) localizations we have to be more

careful.

Why is this useful?

I Lenses inform us where in the space to look for phenomena.I For easy localizations many different lenses will be informative.

I For hard ( = geometrically distributed) localizations we have to be morecareful.

Why is this useful?

careful.

Why is this useful?

careful. But even then, we frequently get incremental knowledge even from apoorly chosen lens.

Modulo Details....

We want to move from this mathematical model to a data driven setup.

Step 1

I Replace points in the range with an open covering of the range.I Connect nodes when their corresponding sets intersect.

⇒ The output is now a graph.

Step 1

I Replace points in the range with an open covering of the range.

I Connect nodes when their corresponding sets intersect.

Step 1

New Parameters

We’ve introduced new parameters into the construction:

I The resolution is the number of open sets in the range.I The gain is the amount of overlap of these intervals.

Roughly speaking, the resolution controls the number of nodes in the output andthe ’size’ of feature you can pick out, while the gain controls the number of edgesand the ’tightness’ of the graph.

New Parameters

We’ve introduced new parameters into the construction:I The resolution is the number of open sets in the range.

I The gain is the amount of overlap of these intervals.Roughly speaking, the resolution controls the number of nodes in the output andthe ’size’ of feature you can pick out, while the gain controls the number of edgesand the ’tightness’ of the graph.

New Parameters

We’ve introduced new parameters into the construction:I The resolution is the number of open sets in the range.I The gain is the amount of overlap of these intervals.

New Parameters

We’ve introduced new parameters into the construction:I The resolution is the number of open sets in the range.I The gain is the amount of overlap of these intervals.

Resolution: A closer look

Step 2: Clustering as π0

We need to make a final adjustment to the algorithm to bring it into data world.

I We replace "connected component of the inverse image" is with "clusters inthe inverse image".

I We connect clusters (nodes) with an edge if they share points in common.

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

I Nodes are clusters of data points

I Edges represent shared points between the clusters

That’s It

Ok not quite...

That’s It

Ok not quite...

Lenses: Where do they come from

The technique rests on finding good lenses.

⇒ Luckily lots of people have worked on this problem

The technique rests on finding good lenses.

⇒ Luckily lots of people have worked on this problem

I Standard data analysis functionsI Geometry and TopologyI Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of Lenses

Statistics Geometry Machine Learning Data DrivenMean/Max/Min Centrality PCA/SVD AgeVariance Curvature Autoencoders Datesn-Moment Harmonic Cycles Isomap/MDS/TSNE ...Density ... SVM Distance from Hyperplane... Error/Debugging Info

I Standard data analysis functions

I Geometry and TopologyI Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics

Geometry Machine Learning Data DrivenMean/Max/Min Centrality PCA/SVD AgeVariance Curvature Autoencoders Datesn-Moment Harmonic Cycles Isomap/MDS/TSNE ...Density ... SVM Distance from Hyperplane... Error/Debugging Info

Geometry Machine Learning Data Driven

Mean/Max/Min

Centrality PCA/SVD AgeVariance Curvature Autoencoders Datesn-Moment Harmonic Cycles Isomap/MDS/TSNE ...Density ... SVM Distance from Hyperplane... Error/Debugging Info

Mean/Max/Min

Centrality PCA/SVD Age

Variance

Curvature Autoencoders Datesn-Moment Harmonic Cycles Isomap/MDS/TSNE ...Density ... SVM Distance from Hyperplane... Error/Debugging Info

Mean/Max/Min

Variance

Curvature Autoencoders Dates

n-Moment

Harmonic Cycles Isomap/MDS/TSNE ...Density ... SVM Distance from Hyperplane... Error/Debugging Info

Mean/Max/Min

Variance

n-Moment

Harmonic Cycles Isomap/MDS/TSNE ...

Density

... SVM Distance from Hyperplane... Error/Debugging Info

Mean/Max/Min

Variance

n-Moment

Density

... SVM Distance from Hyperplane

Error/Debugging Info...

I Standard data analysis functionsI Geometry and Topology

I Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry

Machine Learning Data Driven

Mean/Max/Min

Variance

n-Moment

Density

Mean/Max/Min Centrality

PCA/SVD Age

Variance

n-Moment

Density

PCA/SVD Age

Variance Curvature

Autoencoders Dates

n-Moment

Density

PCA/SVD Age

Variance Curvature

Autoencoders Dates

n-Moment Harmonic Cycles

Isomap/MDS/TSNE ...

Density

PCA/SVD Age

Variance Curvature

Autoencoders Dates

Isomap/MDS/TSNE ...

Density ...

SVM Distance from Hyperplane

I Standard data analysis functionsI Geometry and TopologyI Modern Statistics

I Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry Machine Learning

Data Driven

PCA/SVD Age

Variance Curvature

Autoencoders Dates

Isomap/MDS/TSNE ...

Density ...

Data Driven

Mean/Max/Min Centrality PCA/SVD

Variance Curvature

Autoencoders Dates

Isomap/MDS/TSNE ...

Density ...

Data Driven

Variance Curvature Autoencoders

Isomap/MDS/TSNE ...

Density ...

Data Driven

n-Moment Harmonic Cycles Isomap/MDS/TSNE

Density ...

Data Driven

Density ... SVM Distance from Hyperplane...

Data Driven

Density ... SVM Distance from Hyperplane... Error/Debugging Info

Data Driven

A Non Exhaustive Table of LensesStatistics Geometry Machine Learning Data DrivenMean/Max/Min Centrality PCA/SVD

A Non Exhaustive Table of LensesStatistics Geometry Machine Learning Data DrivenMean/Max/Min Centrality PCA/SVD AgeVariance Curvature Autoencoders

A Non Exhaustive Table of LensesStatistics Geometry Machine Learning Data DrivenMean/Max/Min Centrality PCA/SVD AgeVariance Curvature Autoencoders Datesn-Moment Harmonic Cycles Isomap/MDS/TSNE

A Non Exhaustive Table of LensesStatistics Geometry Machine Learning Data DrivenMean/Max/Min Centrality PCA/SVD AgeVariance Curvature Autoencoders Datesn-Moment Harmonic Cycles Isomap/MDS/TSNE ...Density ... SVM Distance from Hyperplane... Error/Debugging Info

Interperability and Meaning

But what about insight? meaning?

f=⇒Complex Data

The units on the lens give interperability/meaning to the topological summary.

f=⇒Complex Data f is gaussian density

f=⇒Complex Data

f is gaussian density

⇒ The data is bi-modal.

f=⇒Complex Data f is centrality

f=⇒Complex Data

f is centrality

⇒ The data has two ways ofbeing abnormal.

f=⇒Complex Data f is mean

f=⇒Complex Data

f is mean

⇒ Two groups of high meandata.

f=⇒Complex Data f is error

f=⇒Complex Data

f is error

⇒ Two types of error.

f=⇒Complex Data

f is error

⇒ Two types of error.

Another way to think about lenses is as a kind of ’geometric query’.

Examples

1. Heart disease study

I Stratification by age without making arbitrary cutoffs.

2. Heavy machinery

I Use mean a variance as a lens to find what operating regimes lead to failure ofmechanical components.

Examples

1. Heart disease study

I Stratification by age without making arbitrary cutoffs.

2. Heavy machinery

Examples

1. Heart disease studyI Stratification by age without making arbitrary cutoffs.

2. Heavy machinery

Examples

2. Heavy machinery

Examples

2. Heavy machineryI Use mean a variance as a lens to find what operating regimes lead to failure of

mechanical components.

Some generalizations and extensions

MetricsI We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.LensesI Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.Data

I Input space can be anything with a topology. Typically we work withrow/column numeric/categorical data but, for example, graphs are ok.

OutputI The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Some generalizations and extensionsMetrics

I We don’t need a metric, just a notion of similarity - or perhaps a clusteringmechanism.

LensesI Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

Some generalizations and extensionsMetricsI We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.

LensesI Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

mechanism.Lenses

I Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

mechanism.LensesI Lenses don’t need to be continuous - just "sensible".

I We can use multiple lenses at the same time.I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

mechanism.LensesI Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.

I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

mechanism.LensesI Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.I Lenses can map to space other then R.

I In fact, can work with "open covers" of the space (Here taken to meanoverlapping partitions). Don’t need a lens at all.

DataI Input space can be anything with a topology. Typically we work with

row/column numeric/categorical data but, for example, graphs are ok.Output

I The output of the algorithm isn’t just a graph but is an abstract simplicialcomplex (swept under the rug in this presentation).

overlapping partitions). Don’t need a lens at all.

DataI Input space can be anything with a topology. Typically we work with

row/column numeric/categorical data but, for example, graphs are ok.Output

Output

Online FraudFraud Score

Online FraudCharge Back (Ground Truth)

Online FraudTime On Page

Online FraudNo Flash

Online FraudNo Javascript

Parkinson’s Detection with Mobile Phone

Using Topological Data Analysis on your BigData

Technology

Retinal Vessel Segmentation Using A New Topological Method · Retinal Vessel Segmentation Using A New Topological Method Martin.Brooks@varilets.org Abstract A novel topological segmentation

New Framework for Improving Bigdata Analaysis Using Mobile Agent

A Topological Approach to Hierarchical Segmentation using ...people.csail.mit.edu/sparis/publi/2007/cvpr/Paris... · A Topological Approach to Hierarchical Segmentation using Mean

Toward Smart Manufacturing Using Decision Analyticscci.drexel.edu/bigdata/bigdata2014/IEEE_specialsession1... · Toward Smart Manufacturing Using Decision Analytics Alexander Brodsky

Develop modern apps using Spring ecosystem at time of BigData

Identification of source for the bidomain equation using topological

Statistical Topological Data Analysis using Persistence

Using NEMO5 to quantitatively predict topological insulator behaviour

Enterprise Scale Topological Data Analysis Using Spark

Implementando #BigData #Analytics #DataScience com # ...sucesurs.org.br/sites/default/files/2020-03/Implementando BigData... · Implementando #BigData #Analytics #DataScience com

Using a Topological Machine-Learning Method arXiv:1905

Using Multidimensional Topological Data Analysis to Identify ......Using Multidimensional Topological Data Analysis to Identify Traits of Hip OA Article in Journal of Magnetic Resonance

Introduction to Topological Data Analysismvj/GC-BigData... · 4/18/2019 · •Category theory: Homology is a functor. Sequence of spaces X1 → X2 → X3 → X4. Maps to sequence

Using Topological Data Analysis for diagnosis pulmonary

Statistical Topological Data Analysis using Persistence ...jmlr.csail.mit.edu/papers/volume16/bubenik15a/bubenik15a.pdf · Statistical Topological Data Analysis using Persistence

Classifying Ductal Tree Structures Using Topological ...Classifying Ductal Tree Structures Using Topological Descriptors of Branching Angeliki Skoura 1, Vasileios Megalooikonomou ,

Multicriteria Decision Analysis for Topological Routing Using

Topological quantum compilingweb2.physics.fsu.edu/~bonesteel/papers/prb07.pdf“partially” topological quantum computation using a mixture of topological and nontopological gates

STATISTICAL TOPOLOGICAL DATA ANALYSIS USING PERSISTENCE ... · STATISTICAL TOPOLOGICAL DATA ANALYSIS USING PERSISTENCE LANDSCAPES PETER BUBENIK Abstract. We de ne a new topological

Kidnapped Radar: Topological Radar Localisation using … · 2020-01-29 · Kidnapped Radar: Topological Radar Localisation using Rotationally-Invariant Metric Learning S, tefan S˘aftescu,