(Rare) Category Detection Using Hierarchical Mean Shift

Pavan Vatturi (vatturi@eecs.oregonstate.edu)

Weng-Keen Wong (wong@eecs.oregonstate.edu)

1. Introduction

• Applications for surveillance, monitoring, scientific discovery and data cleaning require anomaly detection

• Anomalies often identified as statistically unusual data points

• Many anomalies are simply irrelevant or correspond to known sources of noise

1. Introduction

Known objects (99.9% of the data)

Anomalies (0.1% of the data)

Uninteresting (99% of anomalies)

Interesting (1% of anomalies)

Pictures from: Sloan Digital Sky Survey (http://www.sdss.org/iotw/archive.html)

Pelleg, D. (2004). Scalable and Practical Probability Density Estimators for Scientific Anomaly Detection. PhD Thesis, Carnegie Mellon University.

1. Introduction

Category Detection [Pelleg and Moore 2004]: human-in-the-loop exploratory data analysis

Data Set

Build Model

Spot Interesting Data Points

Ask User to Label Categories of Interesting

Data Points

Update Model with Labels

1. Introduction

Data Set

Build Model

Spot Interesting Data Points

Ask User to Label Categories of Interesting

Data Points

Update Model with Labels

User can:

• Label a query data point under an existing category

• Or declare data point to belong to a previous undeclared category

1. Introduction

• Goal: present to user a single instance from each category in as few queries as possible

• Difficult to detect rare categories if class imbalance is severe

• Interested in rare categories for anomaly detection

Outline

1. Introduction

2. Related Work

3. Background

4. Methodology

5. Results

6. Conclusion / Future Work

2. Related Work

• Interleave [Pelleg and Moore 2004]

• Nearest-Neighbor-based active learning for rare-category detection for multiple classes [He and Carbonell 2008]

• Multiple output identification [Fine and Mansour 2006]

3. Background: Mean Shift [Fukunaga and Hostetler 1975]

Reference data set

Query point

Center of Mass

Mean shift vector (follows density

gradient)

Mean shift vector with kernel k

3. Background: Mean Shift [Fukunaga and Hostetler 1975]

Reference data set

Query point

Center of Mass

Convergence to cluster center

3. Background: Mean Shift Blurring

Reference data set

Query point

Center of Mass

Blurring: •When query points are the same as the reference data set •Progressively blurs the original data set

3. Background: Mean Shift

End result of applying mean shift to a synthetic data set

4. Methodology: Overview

1. Sphere the data

2. Hierarchical Mean Shift

3. Query user

4. Methodology: Hierarchical Mean Shift

Repeatedly blur data using Mean Shift with increasing bandwidth:hnew = k * hold

4. Methodology: Hierarchical Mean Shift

• Mean Shift complexity is O(n2dm) where • n = # of data points• d = dimensionality of

data points• m = # of mean shift

iterations• Single kd-tree optimization

used to speed up Hierarchical Mean Shift

4. Methodology: Querying the User

Rank cluster centers for querying to user.

1. Outlierness [Leung et al. 2000] for Cluster Ci:

in points data ofNumber

of Lifetime sOutliernes

Lifetime of Ci = Log (bandwidth when cluster Ci is merged with other clusters – bandwidth when cluster Ci is formed)

4. Methodology: Querying the User

Rank cluster centers for querying to user.

2. Compactness + Isolation [Leung et al. 2000] for Cluster Ci:

sCompactnes2

Isolation

4. Methodology: Tiebreaker

• Ties may occur in Outlierness or Compactness/Isolation values.

• Highest average distance heuristic: choose cluster center with highest average distance from user-labeled points.

5. Results

Name Dims Records Classes Smallest Class

Largest Class

Abalone 7 4177 20 0.34% 16%

Shuttle 8 4000 7 0.02% 64.2%

OptDigits 64 1040 10 0.77% 50%

OptLetters 16 2128 26 0.37% 24%

Statlog 19 512 7 1.5% 50%

Yeast 8 1484 10 0.33% 31.68%

Shuttle, OptDigits, OptLetters, and Statlog were subsampled to simulate class imbalance.

Data sets used in experiments

5. Results (Yeast)

Category detection metric: # queries before user presented with at least one example from all categories

5. Results (Statlog)

5. Results (OptLetters)

5. Results (OptDigits)

5. Results (Shuttle)

5. Results (Abalone)

5. Results

Dataset HMS-CI HMS-CI+HAD

HMS-Out HMS-Out+HAD

NNDM Interleave

Abalone 1195 93 603 385 124 193

Shuttle 44 32 36 28 162 35

OptDigits 100 100 160 118 576 117

OptLetters 133 133 161 182 420 489

Statlog 18 20 34 124 228 54

Yeast 73 91 103 77 88 111

Number of hints to discover all classes

5. Results

Dataset HMS-CI HMS-CI+HAD

HMS-Out NNDM Interleave

Abalone 0.835 0.873 0.837 0.846 0.840

Shuttle 0.925 0.929 0.917 0.480 0.905

OptDigits 0.855 0.855 0.840 0.199 0.808

OptLetters 0.936 0.936 0.917 0.573 0.765

Statlog 0.956 0.958 0.944 0.472 0.934

Yeast 0.821 0.805 0.793 0.838 0.778

Area under the category detection curve

Conclusions– HMS-based methods consistently discover

more categories in fewer queries than existing methods

– Do not need apriori knowledge of dataset properties eg. total number of classes

Future Work

• Better use of user feedback

• Presentation of an entire cluster to the user instead of a representative data point

• Improved computational efficiency

• Theoretical analysis

(Rare) Category Detection Using Hierarchical Mean Shift

Documents

Mapping of Rare Ecosystem Probability Classes for the Central … · 2001-10-17 · Biogeoclimatic Ecosystem Classification (BEC) provides a hierarchical framework for organizing

Part 2 Hierarchical Models - York University 2 Hierarch… · Hierarchical Model: (the sense in which we use it) a model with hierarchical components intended to analyze hierarchical

J9458-Hierarchical Routes in ArcGIS Network Analystdownloads2.esri.com/.../other_/ArcGIS_NA_Hierarchical_Routes_Aug0… · Hierarchical Routes in ArcGIS ... Hierarchical versus Exact

Holographic Rare Secret Rare Ultra Rare - yugioh-card.com

VHDL Modeling for Synthesis Hierarchical Designnelson/courses/elec4200/Slides/VHDL 7...VHDL Modeling for Synthesis Hierarchical Design Textbook Section 4.8: Add and Shift Multiplier

(8) Hierarchical models - stat.ncsu.edureich/ABA/notes/Hier.pdf · ST440/540: Applied Bayesian Statistics (8) Hierarchical models. Hierarchical models ... Hierarchical linear regression

Hierarchical Bayesian Models - Central Web Server 2 - UITS ...web2.uconn.edu/cyberinfra/module3/Downloads/Day 6 - Hierarchical... · What is a hierarchical model? Hierarchical models

VHDL Modeling for Synthesis Hierarchical Designnelsovp/courses/elec4200... · 2017-10-17 · VHDL Modeling for Synthesis Hierarchical Design Textbook Section 4.8: Add and Shift Multiplier

Hierarchical Modulation for DVB-T · Hierarchical Modulation 3 assured communications ™ 21-Jun-04 What is hierarchical Modulation ? • Hierarchical Modulation allows to transmit

(Rare) Category Detection Using Hierarchical Mean Shift

Exploiting a Rare Shift in Communication Flows to Document ...sekhon.berkeley.edu/causalinf/papers/LaddLenzBritish.pdf · because of media persuasion or because of these two alternatives

Microgrids: A Rare Paradigm Shift in Electricity ... · 11/17/2005 · Environmental Energy Technologies Division Conclusion • expansion of electricity supply is constrained in

Hierarchical Stochastic Neighbor Embeddingnicola17.github.io/publications/2016_hsne/preprint.pdf · limitation, by introducing Hierarchical Stochastic Neighbor Embedding (Hierarchical-SNE)

Clustering, K-Means, and K-Nearest Neighbors · Mean shift. Spectral clustering. Hierarchical clustering: organize elements into a hierarchy. Bottom up - agglomerative. Top down -

JOB DEMAND ANALYSIS - Surfside Buslines€¦ · 3 Legend % of Shift 8 h Shift 10 h Shift 12 h Shift RARE 0-5% 1 –24 min30 36 OCCASIONAL 6% - 33% 25 min–2 hr 38 31 318 37 58 FREQUENT

CS 1675: Intro to Machine Learningkovashka/cs1675_fa18/ml_02_clustering.… · –Hierarchical clustering (start with all points in separate clusters and merge) • The mean shift

India Social hierarchical controls India Social hierarchical controls

An Efficient Hierarchical 3D Mesh Segmentation Using ... · proposed a segmentation method based on clustering mesh normals using mean shift algorithm. The proposed method is robust

1. Hierarchical deep semantic hashing for fast image …liusi-group.com/pdf/Hierarchical-MTAP2016.pdf · 1. Hierarchical deep semantic hashing for ... Hierarchical deep semantic hashing;

Learning Based Hierarchical Vessel Segmentation Learning Based Hierarchical Vessel Segmentation Learning Based Hierarchical Vessel Segmentation Presenter: