A Novel Approach to Novelty Detection: Voronoi Tesselation Jeffrey.D.Scargle@nasa.gov Space Science...

Preview:

Citation preview

A Novel Approach to Novelty Detection:Voronoi Tesselation

Jeffrey.D.Scargle@nasa.govSpace Science Division

NASA Ames Research Center

Collaborator: Nikunj Oza, NASA-Ames Research Center, ICPureSense, Inc.

Machine Learning Seminar

The Basic Ideas Nonparametric Density EstimationVoronoi Tessellation Voronoi Cells as Point Surrogates 1/Area of cell ~ local point density Cell geometry local density gradientTessellate training points plus 1 test pointIf the Voronoi cell assigned to the test point is an “edge” cell, the test point is an outlier; otherwise it is “normal”

Modes of Operation (1)

Static training data and test data

Modes of Operation (2)

Training data = all past dataTest data = one new data point

Modes of Operation (2)

Training data = all past dataTest data = one new data point

Modes of Operation (3)

Training data = past data of fixed sizeTest data = one new data point

Voronoi Tessellation of data in any dimension

Construct Voronoi cells to represent local photon density:

density ~ 1 / cell area

Voronoi cells also represent local photon density gradients

The Voronoi cells are a local representation of the data …

Selecting the smallest Voronoi cells yields the regions of highest photon density …

MatLab code

% do abnormal data for id = 1: num_test data = [ train_data test_data( id ) ]; [ vertices, v_cells ] = voronoin( data ); vertices_last = v_cells{ num_use + 1 };

if find( vertices_last == 1 ) % “infinite vertex” = #1 count_correct = count_correct + 1; else count_error = count_error + 1; end end

Biomed dataset: Cox, Johnson and Kafadar (1982), Exposition of statistical graphics technology,ASA Proceedings of the Statistical Computing Section, p. 55-6

Biomed dataset: Cox, Johnson and Kafadar (1982), Exposition of statistical graphics technology,ASA Proceedings of the Statistical Computing Section, p. 55-6

67 Abnormal Inputs 27 Normal InputsCorrect Wrong Correct

Wrong--------------------------------------------------------------------------------------------KernelClassifier 57 10 25 2

Grow WhenRequired net 56 11 25 2

Voronoi - mean 57.2 9.8 17.6 9.3 - best 60 7 25 2

Curse of Dimensionality?

Computation time for Voronoi Tessellation is roughly linear in number of data points.

But … much steeper function of the dimensionality.

In the “ball bearing” data set (following example) thedimensionality of the raw data is 32. I used singular Value decomposition to reduce the dimensionality.

Curse of Dimensionality?

Ball-bearing dataEPSRC Structural Integrity & Damage Assessmentwww.brunel.ac.uk/research/cnca/sida/html/data.htm

Normal Broken Damaged Basket ½ runs(New) Ring Basket destroyed loosely

-------------------------------------------------------------------------------------------Linear programmingkernel 1.3% 0% 46.7% 71.7% 74.5%

Grow WhenRequired net 37.8% 40.3% 43.8 4.6% 4.9%

LPDD* 0%? 0% 8.3%

Voronoi 3 1.6% 0% 30.7% 30.7% 35.7%4 6.4% 0% 12.1% 16.2% 19.9%

5 13.5% 0.7% 25.5% 28.9% 34.2%

Novelty Detection in Time Series

Multivariate Time Series

For single time series, use embedding captures the dynamical behavior of the process increases the dimensionality.

X(tn) { X(tn), X(tn+1), X(tn+2), …, X(tn+k-1)}Online Novelty Detection on Temporal SequencesJunshui Ma and Simon Perkins, SIGKDD 2003

Better:

X(tn) { X(tn), X(tn+m), X(tn+2m), …, X(tn+(k-1))}

Recommended