51
Shape as Organizing Principle for Data MLConf Seattle 2015 Anthony Bak, Principal Data Scientist

Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Embed Size (px)

Citation preview

Page 1: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Shape as Organizing Principle for

Data

MLConf Seattle 2015

Anthony Bak, Principal Data Scientist

Page 2: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

The Data Problem: Complexity

Page 3: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Solution: Topological Summaries

Page 4: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Shape as Organizing

Principle for Data

Page 5: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Shape as Organizing Principle

Page 6: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Reduce Bias, Discover Models

TDA tells you the data you have,

not the data you want to have.

Page 7: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological

Summaries

Page 8: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 9: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 10: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 11: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 12: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 13: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 14: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 15: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 16: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 17: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 18: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 19: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 20: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 21: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 22: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 23: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 24: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Remember/Forget

Use multiple lenses/metrics to get the complete picture

Different lenses provide different summaries

Page 25: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Generating Topological Summaries

Page 26: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Lenses: where do they come from?

Mean/Max/Min

Variance

n-Moment

Density

Statistics

PCA/SVD

Autoencoders

Isomap/MDS/TS

NE

Machine

Learning

Centrality

Curvature

Harmonic Cycles

Geometry

Page 27: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Why Topology?

Page 28: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Key Properties of TDA

Deformation

Invariance

Compressed

Representation

Coordinate

Freeness

Page 29: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Coordinate Invariance

1. Topology of shape doesn’t depend on the coordinates used to

describe the shape

1. Different feature sets can describe the same phenomena

1. While processing data, we frequently alter coordinates: scaling,

rotating, whitening

You want to study properties of your data that are invariant

under coordinate changes

Page 30: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Coordinate Invariance: Gene Expression

NKI

GSE230

Page 31: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Coordinate Invariance: Disease State

Page 32: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Deformation Invariance

• Topological features don’t change when you stretch and distort the

data

Advantage: Makes problems easier

Noise resistance

Less pre-processing of data

Robust (stable) data

Page 33: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Deformation Invariance

Page 34: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Deformation Invariance

Page 35: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Deformation Invariance

Page 36: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Deformation Invariance

Page 37: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Compressed Representation

• Replace the metric space with a combinatorial summary: a simplicial

complex.

• Data becomes easier to manage, search, and query while

maintaining essential features.

• Leverages many known algorithms from graph theory, computational

topology, computational geometry.

Page 38: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Compressed Representation

Page 39: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Baby Steps: PCA

Page 40: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

PCA

Page 41: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

PCA

Page 42: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Data Stories

Page 43: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Model Introspection

Page 44: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Model Introspection

Page 45: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Predictive Maintenance

Page 46: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Customer Churn

Page 47: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Customer Churn

Page 48: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Transaction Fraud

Page 49: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Transaction Fraud

Page 50: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Transaction Fraud

Page 51: Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

We’re Hiring!http://www.ayasdi.com/company/careers/

Data Has Shape

And

Shape Has Meaning