Adaptive Sampling with Topological Scores

Preview:

DESCRIPTION

Adaptive Sampling with Topological Scores. Dan Maljovec 1 , Bei Wang 1 , Ana Kupresanin 2 , Gardar Johannesson 2 , Valerio Pascucci 1 , and Peer- Timo Bremer 2 1 SCI Institute, University of Utah 2 Lawrence Livermore National Laboratory. 1. Framework:. 4. 2. 3. - PowerPoint PPT Presentation

Citation preview

Adaptive Sampling with Topological Scores

Dan Maljovec1, Bei Wang1, Ana Kupresanin2, Gardar Johannesson2, Valerio Pascucci1, and Peer-Timo

Bremer2

1SCI Institute, University of Utah2Lawrence Livermore National Laboratory

Select training data & run simulation

Fit a Predicting Model (PM)

Select candidates & predict response values from PM

Score candidates & select candidate to add to training data

1

2

3

4

Framework:

Choosing the “Right” Points4 Models fit by using 10 training points:

True Response Model:

Space-filling SamplingNo prior knowledge of the dataset

Where should we sample the model?

Space-filling SamplingFill the domain space as evenly as possible

Space-filling SamplingObtain responses and fit model

Space-filling Sampling1 round:

Space-filling Sampling2 rounds:

Space-filling Sampling3 rounds:

Space-filling Sampling4 rounds:

Space-filling Sampling5 rounds:

Space-filling SamplingRefit after 5 rounds of “filling voids”:

Space-filling SamplingWhat have we learned?

Initial fit Refit after adding 5 points

Topologically-Inspired Adaptive Sampling

Starting over

Topologically-Inspired Adaptive Sampling

1 round:

Topologically-Inspired Adaptive Sampling

2 rounds:

Topologically-Inspired Adaptive Sampling

3 rounds:

Topologically-Inspired Adaptive Sampling

4 rounds:

Topologically-Inspired Adaptive Sampling

5 rounds:

Topologically-Inspired Adaptive Sampling

Refit after 5 rounds of choosing the most topologically significant point:

What have we learned?

Initial fit Refit after adding 5 points

Topologically-Inspired Adaptive Sampling

Space-filling Method Adaptive Method

Comparison

Initial Fit

Space-filling Method Adaptive Method

Comparison

True Response Model

Measuring Topological SignificanceThese points were selected in topologically significant regions:

How do we measure topological impact?

Morse-Smale ComplexA partition of the domain into monotonic regions

Stable Manifolds Unstable Manifolds Morse-Smale Complex

∩ =

Computing the Morse-Smale Complex

Stable Manifolds Unstable Manifolds Morse-Smale Complex

∩ =

Traditionally:1. Compute steepest ascent/descent gradient from each point in

dataset2. Color points twice once by following ascending gradient to

maximum & another by descending gradient to minimum3. Intersect partitions to form Morse-Smale Complex

Computing the Morse-Smale ComplexTraditionally:

1.a. Compute steepest ascent gradient from each point in dataset2.a. Color points based on maximum where flow halts

Computing the Morse-Smale ComplexTraditionally:

1.b. Compute steepest descent gradient from each point in dataset2.b. Color points based on minimum where flow halts

Computing the Morse-Smale Complex

Stable Manifolds Unstable Manifolds Morse-Smale Complex

∩ =

Traditionally:3. Intersect partitions to form Morse-Smale Complex

Computing the Morse-Smale Complex

Stable Manifolds Unstable Manifolds Morse-Smale Complex

∩ =

Traditionally:1. Compute steepest ascent/descent gradient from each point in

dataset Assumes points have a neighborhood2. Color points twice once by following ascending gradient to

maximum & another by descending gradient to minimum3. Intersect partitions to form Morse-Smale Complex

Computing the Approximate Morse-Smale Complex

Stable Manifolds Unstable Manifolds Morse-Smale Complex

∩ =

Modified Algorithm:1. Compute approximate KNN for a point2. Compute steepest ascent/descent gradient from each point’s

neighborhood in dataset3. Color points by gradient flows4. Intersect partitions to form Morse-Smale Complex

0 1 2 30

1

2

3

x

y

a b

c

d

Tracking Persistence

Count the number of connected components belonging to the sub-level sets of the function

0 1 2 30

1

2

3

x

y

a b

c

d

Tracking Persistence

Every connected component has a birth

0 1 2 30

1

2

3

x

y

a b

c

d

Tracking Persistence

Every connected component has a birth

0 1 2 30

1

2

3

x

y

a b

c

d

Tracking Persistence

And every connected component has a death

0 1 2 30

1

2

3

x

y

a b

c

d

Tracking Persistence

Births and deaths of connected components are tied to critical points of the function

0 1 2 30

1

2

3

x

y

a b

c

d

Tracking Persistence

Births and deaths of connected components are tied to critical points of the function

0 1 2 30

1

2

3

x

y

a b

c

d

Tracking Persistence

0 1 2 30

1

2

3

0 1 2 30

1

2

3

x

y

birth

death

a b

c

d

Tracking Persistence

A persistence diagram plots birth-death pairings

The function value difference between the critical point pairings is what we define as the persistence of a feature

In surfaces (and higher dimension corollaries), we have saddle points

Persistence Simplification in 2D

Persistence Simplification in 2DWe can smooth away noise features by thresholding

critical point pairs above a certain persistence level

Persistence Simplification in 2D

0 1 2 30

1

2

3

x

y

birth

death

0 1 2 30

1

2

3

Bottleneck Distance

Comparing two similar functions persistence diagrams

0 1 2 30

1

2

3

0 1 2 30

1

2

3

x

y

birth

death

Bottleneck Distance

Optimization: minimize the distance between pairs of points in the two plots including the line y=x in both pairs

Bottleneck distance – the maximum distance resulting from this pairing

Select training data & run simulation

Fit a Predicting Model (PM)

Select candidates & predict response values from PM

Score candidates & select candidate to add to training data

1

2

3

4

Framework:

Selecting Initial Training DataTests use Latin Hypercube Sampling

Fitting a Statistical ModelExperimented with:• Gaussian Process Model– Sparse Online Gaussian Process (C++)

• 2 Implementations of MARS with bootstrapping– Earth (CRAN)– MDA (CRAN)

• Neural Network with bootstrapping– NNet (CRAN)

Classical ScoringActive-Learning McKay (ALM)– Sample high-frequency or low-confidence regions

Delta– Distribute samples in the range space or areas of steep

gradientExpected Improvement (EI)– Select points with high uncertainty or large discrepancy

with existing dataDistance (*DP)– Scaling factor applied to above, creating 3 new scoring

functions (ALMDP, DeltaDP, EIDP)

Topological ScoringTOPOP– Average change in persistence of each point computed

by comparing the Morse-Smale before and after inserting a candidate

TOPOB– Bottleneck distance between persistence diagram of

Morse-Smale before and after inserting a candidate point

TOPOHP– Highest persistence critical point in Morse-Smale

complex constructed from combined training and predicted candidate data

Topological ScoringAverage Change in Persistence (TOPOP)– Average change in persistence of each point

computed by comparing the Morse-Smale before and after inserting a candidate

Topological ScoringAverage Change in Persistence (TOPOP)– Average change in persistence of each point

computed by comparing the Morse-Smale before and after inserting a candidate

Topological ScoringBottleneck Distance in Persistence (TOPOB)– Bottleneck distance between persistence diagram of

Morse-Smale before and after inserting a candidate point

Topological ScoringBottleneck Distance in Persistence (TOPOB)– Bottleneck distance between persistence diagram of

Morse-Smale before and after inserting a candidate point

Topological ScoringBottleneck Distance in Persistence (TOPOB)– Bottleneck distance between persistence diagram of

Morse-Smale before and after inserting a candidate point

Topological ScoringHighest Persistence (TOPOHP)– Find highest persistence critical point in Morse-

Smale complex constructed from combined training and predicted candidate data

Initial Test Functions

Initial Test FunctionsDiagnosis:Little hope of modeling the complexity of some of these functions, especially in higher dimensions.

Test FunctionsAckley Diagonal 2 Max Diagonal 4 Max

2-Component GMM

5-Component GMM

Mis-Scaled Generalized

Rastrigin (MGR)

Salomon Generalized Rosenbrock

Comparing Model ResultsGPM Earth

NNet MDA

GPM Results

GPM Results

GPM Results

Moving ForwardLimitations

– Computation time of approximate K-nearest neighborhood is cost-prohibitive to build for every candidate point (TOPOB & TOPOP)• Morse-Smale computation is limited by number of extrema which need processed

– Topology is based on current prediction modelUnknowns

– Good parameter selection (k for KNN, parameters of prior)Future Work

– Integrate visualizations to better understand how points are being selected and gain intuition as to how to improve results

– Better neighborhood estimates– Investigate other means of measuring topological impact– Fitting local models based on topological partitioning– Using better models– Hybrid scoring or hybrid sampling techniques

Recommended