1
Feature Screening Concept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold. Problem: What is a good ranking criterion (relevance measure or feature weight)? Intuition: Large feature weight if data are well separated along that feature direction Observations: Decision boundary h(s) encodes all discriminative information. h(s) of SVM has an analytical form. Boundary normal identifies the direction along which the data are locally well separated around the neighborhood of boundary point s. Conclusions: Given any direction u, a local relevance measure can be defined as the “consistency” between N(s) and u (e.g. |u T N(s)|, u T N(s)N(s) T u). Decision Boundary Scatter Matrix (DBSM) summarizes local discriminative directions over the whole decision boundary. Given any direction u, a global relevance measure can be defined as the “consistency” between M and u (e.g. u T Mu). Cervical Cancer Detection Using SVM Based Feature Screening Jiayong Zhang & Yanxi Liu, The Robotics Institute, Carnegie Mellon University Introduction Annually, over 50 million Pap smears are done in US and over 60 million in the rest of the world. Finding abnormal cells in Pap smear images remains to be a “needle in a hay- stack” type of problem. Highly accurate, automated screening systems are in great need. Previous works mostly extract shape features at the cellular level in accordance with the “Bethesda System” rules. However, due to image segmentation errors, cellular shape analysis can be rather difficult. We investigate this problem on a novel image modality (multispectral), and propose a bottom-up approach to automatically detect cancerous regions without the requirement of accurate segmentation. By exploring an initial image feature space of nearly 4,000 dimensions that captures local multispectral and texture information, we found that existing feature subset selection algorithms are computationally challenged by such large sized feature set. One alternative is to use simple feature screening measures, e.g. Information Gain (IG) and Augmented Variance Ratio (AVR), to rule out irrelevant features. However, by evaluating each feature independently, they may fail to capture all highly discriminative subsets, which could be composed of individually less discriminative features. In this work, we present a novel feature screening algorithm by deriving relevance measures from the decision boundary of Support Vector Machines. Advantages: • Relevance measures (feature weights) derived simultaneously for all dimensions • Optimal in Structural Risk Minimization sense Better discriminative power indicator • Efficient SVM training Little sacrifice in computational cost Image Preproces sing Pixel Classific ation Blockwise Feature Extraction Feature Screening/Selectio n Classification Multispectral Pap Smear Images Background Segmentation Intensity Normalization Cancerous Regions Candidate Region Detection Region Merging Region Detection 400 nm ~ 690 nm, evenly divided into 52 bands Multispectral Texture Features Statistics (10): maximum, minimum, range, median, mean, standard deviation, energy, skewness, kurtosis and entropy. Wavelets (4): DB2 and DB16 (Orthogonal), Bior2.2 (Bi- orthogonal), Gabor (Non-orthogonal). These features are generated per pixel, per spectral band. Detection System Overview Applying sequential backward selection to surviving features of screening procedure leads to further reduction in subset sizes. Conclusion We show the effectiveness of image feature screening/selection in cancerous cell detection on a novel image modality (multispectral). An initial set of around 4,000 multispectral texture features is effectively reduced to a computationally manageable size. Comparative experiments show significant improvements on pixel-level classification accuracy using the new feature screening method. A much larger PAP smear image set and an even richer image feature space will be used to further validate our method. 0 20 40 60 80 100 120 140 1 5 9 13 17 B and N um berofS elections 0 20 40 60 80 100 120 140 160 180 DB2 D B 16 B ior22 G abor Types ofW avelets N um berofS elections 4 20 36 52 68 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 Num b er of Featu res A verage False P ositive R ate TP R = 0.95 O riginal O utlierRem oved CR TPR FPR # cells 147/149 40/41 1/108 ratio % 98.7 97.6 0.09 DB2 DB16 Bior2 .2 Gabo r Combine d Original Dimensions 800 800 900 1200 3700 After IG Screening 74 76 71 51 144 After AVR Screening 47 35 41 21 45 After SVM Screening 48 42 52 30 68 Evaluation Region-level detection. Leave-one-out system evaluation. Various dimensions before and after feature screening. Analysis of the selected feature subsets with respect to their feature type and spectral band distribution provides some insights into the interpretations of the results. M ean,37% M edian,31% E ntropy,20% R ange,4% M ax,3% Min,3% S td,2% E nergy,0% S kew ness,0% K urtosis,0% () () () N h h s s s () ()() T B M NsN spsds Acknowledgments This research was funded in part by Pennsylvania Department of Health grant ME01-738; and in part by National Institute of Health (NIH) grant N01-CO-07119. 0 0.1 0.2 0.3 0.4 0.5 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 False Positive R ate (FP R ) True P ositive R ate (TP R ) SVM Based S creening IG and A VR S creening Pixel-level classification. Comparison between SVM and IG+AVR screenings. 1 () () (,) 0 n i i i i h b yK b x w x xx An example of cancerous region detection. (a) Original image. (b) Scaled output surface from discriminative filtering. (c) Gaussian smoothing of (b). (d) Local maxima points found in (c). (e) Contours of candidate cancerous regions. (f) Merged result. (a) (b) (c) (d) (e) (f)

Feature Screening

Embed Size (px)

DESCRIPTION

Multispectral Pap Smear Images. Background Segmentation. Image Preprocessing. Intensity Normalization. Blockwise Feature Extraction. Pixel Classification. Feature Screening/Selection. Classification. Candidate Region Detection. Region Detection. Region Merging. Cancerous Regions. - PowerPoint PPT Presentation

Citation preview

Page 1: Feature Screening

Feature ScreeningConcept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold.

Problem: What is a good ranking criterion (relevance measure or feature weight)?

Intuition: Large feature weight if data are well separated along that feature direction

Observations:• Decision boundary h(s) encodes all discriminative information. • h(s) of SVM has an analytical form.

• Boundary normal identifies the direction along which the data are locally well separated around the neighborhood of boundary point s.

Conclusions: • Given any direction u, a local relevance measure can be defined as the “consistency” between N(s) and u (e.g. |uTN(s)|, uTN(s)N(s)Tu).• Decision Boundary Scatter Matrix (DBSM) summarizes local discriminative directions over the whole decision boundary.

• Given any direction u, a global relevance measure can be defined as the “consistency” between M and u (e.g. uTMu).

Cervical Cancer Detection Using SVM Based Feature Screening

Jiayong Zhang & Yanxi Liu, The Robotics Institute, Carnegie Mellon University

IntroductionAnnually, over 50 million Pap smears are done in US and over 60 million in the rest of the world. Finding abnormal cells in Pap smear images remains to be a “needle in a hay-stack” type of problem. Highly accurate, automated screening systems are in great need.

Previous works mostly extract shape features at the cellular level in accordance with the “Bethesda System” rules. However, due to image segmentation errors, cellular shape analysis can be rather difficult.

We investigate this problem on a novel image modality (multispectral), and propose a bottom-up approach to automatically detect cancerous regions without the requirement of accurate segmentation.

By exploring an initial image feature space of nearly 4,000 dimensions that captures local multispectral and texture information, we found that existing feature subset selection algorithms are computationally challenged by such large sized feature set.

One alternative is to use simple feature screening measures, e.g. Information Gain (IG) and Augmented Variance Ratio (AVR), to rule out irrelevant features. However, by evaluating each feature independently, they may fail to capture all highly discriminative subsets, which could be composed of individually less discriminative features.

In this work, we present a novel feature screening algorithm by deriving relevance measures from the decision boundary of Support Vector Machines. Advantages: • Relevance measures (feature weights) derived simultaneously for all dimensions• Optimal in Structural Risk Minimization sense Better discriminative power indicator• Efficient SVM training Little sacrifice in computational cost

Image Preprocessing

Pixel Classification

Blockwise Feature Extraction

Feature Screening/Selection

Classification

Multispectral Pap Smear Images

Background Segmentation

Intensity Normalization

Cancerous Regions

Candidate Region Detection

Region Merging

Region Detection

400 nm ~ 690 nm, evenly divided into 52 bands

Multispectral Texture Features

• Statistics (10): maximum, minimum, range, median, mean, standard deviation, energy, skewness, kurtosis and entropy.

• Wavelets (4): DB2 and DB16 (Orthogonal), Bior2.2 (Bi-orthogonal), Gabor (Non-orthogonal).

These features are generated per pixel, per spectral band.

Detection System Overview

Applying sequential backward selection to surviving features of screening procedure leads to further reduction in subset sizes.

ConclusionWe show the effectiveness of image feature screening/selection in cancerous cell detection on a novel image modality (multispectral). An initial set of around 4,000 multispectral texture features is effectively reduced to a computationally manageable size. Comparative experiments show significant improvements on pixel-level classification accuracy using the new feature screening method. A much larger PAP smear image set and an even richer image feature space will be used to further validate our method.

0 20 40 60 80 100 120 140

1

5

9

13

17

Ban

d

Number of Selections0 20 40 60 80 100 120 140 160 180

DB2

DB16

Bior22

Gabor

Typ

es o

f W

avel

ets

Number of Selections

4 20 36 52 680.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Number of Features

Av

era

ge

Fa

lse

Po

sit

ive

Rat

e

TPR = 0.95

OriginalOutlier Removed

CR TPR FPR

# cells 147/149 40/41 1/108

ratio % 98.7 97.6 0.09

DB2 DB16 Bior2.2 Gabor Combined

Original Dimensions 800 800 900 1200 3700

After IG Screening 74 76 71 51 144

After AVR Screening 47 35 41 21 45

After SVM Screening 48 42 52 30 68

Evaluation

Region-level detection. Leave-one-out system evaluation.

Various dimensions before and after feature screening.

Analysis of the selected feature subsets with respect to their feature type and spectral band distribution provides some insights into the interpretations of the results.

Mean, 37%

Median, 31%

Entropy, 20%

Range, 4%

Max, 3%

Min, 3%

Std, 2%

Energy, 0%

Skewness, 0%

Kurtosis, 0%

( ) ( ) ( )N h h s s s

( ) ( ) ( )T

BM N s N s p s ds

AcknowledgmentsThis research was funded in part by Pennsylvania Department of Health grant ME01-738; and in part by National Institute of Health (NIH) grant N01-CO-07119.

0 0.1 0.2 0.3 0.4 0.50.86

0.88

0.9

0.92

0.94

0.96

0.98

1

False Positive Rate (FPR)

Tru

e P

ositi

ve R

ate

(TP

R)

SVM Based ScreeningIG and AVR Screening

Pixel-level classification. Comparison between SVM and IG+AVR screenings.

1( ) ( ) ( , ) 0

n

i i iih b y K b

x w x x x

An example of cancerous region detection. (a) Original image. (b) Scaled output surface from discriminative filtering. (c) Gaussian smoothing of (b). (d) Local maxima points found in (c). (e) Contours of candidate cancerous regions. (f) Merged result.

(a) (b) (c) (d) (e) (f)