48
1 Feature Selection Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

Embed Size (px)

Citation preview

Page 1: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

1Feature Selection

Feature Selection for Image Retrieval

By Karina Zapién Arreola

January 21th, 2005

Page 2: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

2Feature Selection

Introduction

Variable and feature selection have become the focus of much research in areas of applications for datasets with many variables are available

Text processing

Gene expression

Combinatorial chemistry

Page 3: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

3Feature Selection

Motivation

The objective of feature selection is three-fold:

Improving the prediction performance of the predictors

Providing a faster and more cost-effective predictors

Providing a better understanding of the underlying process that generated the data

Page 4: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

4Feature Selection

Why use feature selection in CBIR

Different users may need different features for image retrieval

From each selected sample, a specific feature set can be chosen

Page 5: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

5Feature Selection

Boosting

Method for improving the accuracy of any learning algorithm

Use of “weak algorithms” for single rules

Weighting of the weak algorithms

Combination of weak rules into a strong learning algorithm

Page 6: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

6Feature Selection

Adaboost Algorithm

Is a iterative boosting algorithmNotation

Samples (x1,y1),…,(xm,ym), where, yi= -1,1There are m positive samples, and l negative samples

Weak classifiers hi

For iteration t, the error is defined as:

εt = min (½)Σi ωi |hi(xi) – yi|

where ωi is a weight for xi.

Page 7: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

7Feature Selection

Adaboost Algorithm

Given samples (x1,y1),…,(xm,ym), where yi = -1,1

Initialize ω1,i=1/(2m), 1/(2l), for yi = 1,-1

For t=1,…,TNormalize ωt,i = ωt,i /(Σj ωt,j)

Train base learner ht,i using distribution ωi,j

Choose ht that minimize εt with error ei

Update ωt+1,i = ωt,i βt1-ei

Set βt = (εt)/(1- εt) and αt = log(1/ βt)

Output the final classifier H(x) = sign( Σt αt ht(x) )

Page 8: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

8Feature Selection

Adaboost Application

Searching similar groupsA particular image class is chosen

A positive sample of this group is given randomly

A negative sample of the rest of the images is given randomly

Page 9: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

9Feature Selection

Check list Feature Selection

Domain knowledgeCommensurate featuresInterdependence of featuresPrune of input variablesAsses features individuallyDirty dataPredictor – linear predictorComparisonStable solution

Page 10: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

10Feature Selection

Domain knowledgeFeatures used

colordb_sumRGB_entropy_d1col_gpd_hsvcol_gpd_labcol_gpd_rgbcol_hu_hsv2col_hu_lab2col_hu_labcol_hu_rgb2col_hu_rgbcol_hu_seg2_hsvcol_hu_seg2_labcol_hu_seg2_rgb

Features usedcol_hu_seg_hsv

col_hu_seg_lab

col_hu_seg_rgb

col_hu_yiq

col_ngcm_rgb

col_sm_hsv

col_sm_lab

col_sm_rgb

col_sm_yiq

text_gabor

text_tamura

edgeDB

waveletDB

Features usedhist_phc_hsv

hist_phc_rgb

Hist_Grad_RGB

haar_RGB

haar_HSV

haar_rgb

haar_hmmd

Page 11: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

11Feature Selection

Check list Feature Selection

Domain knowledge

Commensurate features

Normalize features between an appropriated range

Adaboost takes each feature independent so it is not necessary to normalize them

Page 12: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

12Feature Selection

Check list Feature Selection

Domain knowledgeCommensurate featuresInterdependence of featuresPrune of input variablesAsses features individuallyDirty dataPredictor – linear predictorComparisonStable solution

Page 13: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

13Feature Selection

Feature construction and space dimensionality reduction

Clustering

Correlation coefficient

Supervised feature selection

Filters

Page 14: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

14Feature Selection

Check list Feature Selection

Domain knowledge

Commensurate features

Interdependence of features

Prune of input variables

Features with the same value for all samples (variance=0) were eliminated

From4912 Linear Features3583 were selected

Page 15: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

15Feature Selection

Check list Feature Selection

Domain knowledgeCommensurate featuresInterdependence of featuresPrune of input variablesAsses features individually

When there is no asses method, use Variable Ranking method. In Adaboost this is not necessary

Page 16: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

16Feature Selection

Variable Ranking

Preprocessing step

Independent of the choice of the predictor

Correlation criteriaIt can only detect linear dependencies

Single variable classifiers

Page 17: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

17Feature Selection

Variable Ranking

Noise reduction and better classification may be obtained by adding variables that are presumable redundant

Perfectly correlated variables are truly redundant in the sense that no additional information is gained by adding them. It doesn’t mean absence of variable complementarily

Two variables that are useless by themselves can be useful together

Page 18: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

18Feature Selection

Check list Feature Selection

Domain knowledgeCommensurate featuresInterdependence of featuresPrune of input variablesAsses features individuallyDirty dataPredictor – linear predictorComparisonStable solution

Page 19: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

19Feature Selection

Check list Feature Selection

Domain knowledgeCommensurate featuresInterdependence of featuresPrune of input variablesAsses features individuallyDirty dataPredictor – linear predictorComparisonStable solution

Page 20: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

20Feature Selection

Adaboost Algorithm

Given samples (x1,y1),…,(xm,ym), where xi, yi -1,1

Initialize ω1,i=1/(2m), 1/(2l), for yi = -1,1

For t=1,…,TNormalize ωt,i = ωt,i /(Σj ωt,j)

Train base learner ht,i using distribution ωi,j

Choose ht that minimize εt with error ei

Update ωt+1,i = ωt,i βt1-ei

Set βt = (εt)/(1- εt) and αt = log(1/ βt)

Output the final classifier H(x) = sign( Σt αt ht(x) )

Page 21: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

21Feature Selection

Weak classifier

Each weak classifier hi is defined as follows:

hi.pos_mean – mean value for positive samples

hi.neg_mean – mean value for negative sample

A sample is classified as:1 if it is closer to hi.pos_mean

-1 if it is closer to hi.neg_mean

Page 22: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

22Feature Selection

Weak classifier

hi.pos_mean – mean value for positive samples

hi.neg_mean – mean value for negative sample

A Linear Classifier was used

hi.neg_mean

hi.pos_mean

Page 23: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

23Feature Selection

Check list Feature Selection

Domain knowledgeCommensurate featuresInterdependence of featuresPrune of input variablesAsses features individuallyDirty dataPredictor – linear predictorComparisonStable solution

Page 24: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

24Feature Selection

Adaboost experiments and results

4 positives

10 positives

Page 25: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

25Feature Selection

Few positive samplesUse of 4

positive samples

Page 26: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

26Feature Selection

More positive samples

False Positive

Use of 10 positive samples

Page 27: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

27Feature Selection

Training data

Training data

Test data

False negative

Use of 10 positive samples

Page 28: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

28Feature Selection

Changing number of Training Iterations

The number of iterations

Used was from 5 to 50

Iterations = 30was set

Page 29: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

29Feature Selection

Changing Sample Size

5 pos10 pos

15 pos20 pos

25 pos

30 pos35 pos

Page 30: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

30Feature Selection

Few negative samplesUse of 15

negative samples

Page 31: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

31Feature Selection

More negative samplesUse of 75

negative samples

Page 32: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

32Feature Selection

Check list Feature Selection

Domain knowledgeCommensurate featuresInterdependence of featuresPrune of input variablesAsses features individuallyDirty dataPredictor – linear predictorComparison (ideas, time, comp. resources, examples)Stable solution

Page 33: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

33Feature Selection

Stable solution

For Adaboost is important to have a representative sample

Chosen parameters:Positives samples: 15

Negative samples: 100

Iteration number: 30

Page 34: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

34Feature Selection

Stable solution with more samples and iterations

Beaches

Dinosaurs

Mountains

ElephantsBuildings

Humans

RosesBusesHorses

Food

Page 35: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

35Feature Selection

Stable solution for DinosaursUse of:• 15 Positive samples• 100 Negative samples• 30 Iterations

Page 36: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

36Feature Selection

Stable solution for RosesUse of:• 15 Positive samples• 100 Negative samples• 30 Iterations

Page 37: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

37Feature Selection

Stable solution for BusesUse of:• 15 Positive samples• 100 Negative samples• 30 Iterations

Page 38: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

38Feature Selection

Stable solution for BeachesUse of:• 15 Positive samples• 100 Negative samples• 30 Iterations

Page 39: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

39Feature Selection

Stable solution for FoodUse of:• 15 Positive samples• 100 Negative samples• 30 Iterations

Page 40: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

40Feature Selection

Unstable Solution

Page 41: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

41Feature Selection

Unstable solution for Roses

Use of:• 5 Positive samples• 10 Negative samples• 30 Iterations

Page 42: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

42Feature Selection

Best features for classification

HumansBeachesBuildingsBusesDinosaursElephantsRosesHorsesMountainsFood

Page 43: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

43Feature Selection

And the winner is…

Page 44: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

44Feature Selection

Feature frequency

Feature's Frequency

00.020.040.060.08

0.10.120.140.160.18

0.2

Haar_

rgb_

norm

haar

_hmm

d

hist_

Grad_

RGB

haar

_RGB

haar

_HSV

hist_

phc_

hsv

hist_

phc_

rgb

03-co

l_gpd

_hsv

.mat

03-co

l_sm

_yiq.

mat

03-co

l_hu_

yiq.m

at

03-co

l_hu_

seg_

hsv.m

at

03-co

l_sm

_lab.m

at

04-te

xt_ta

mur

a.mat

05-e

dgeD

B

03-co

l_sm

_rgb

.mat

03-co

l_gpd

_rgb

.mat

03-co

l_gpd

_lab.

mat

05-w

avel

etDB

03-co

l_hu_

lab.m

at

03-co

l_hu_

seg_

rgb.

mat

03-co

l_ngc

m_rgb

.mat

03-co

l_hu_

seg_

lab.

mat

04-te

xt_ga

bor.m

at

Feature

Appe

aren

ce t

imes

Page 45: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

45Feature Selection

Extensions

Searching similar imagesPairs of images are built

The difference for each feature is calculated

Each difference is classified as: 1 if both images belong to the same class

-1 if both images belong to different classes

Multiclass adaboost

Page 46: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

46Feature Selection

Extensions

Use of another weak classifierDesign weak classifier using multiple features

→ classifier fusion

Use different weak classifier such as SVM, NN, threshold function, etc.

Different feature selection method: SVM

Page 47: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

47Feature Selection

Discussion

Is important to add feature Selection for Image retrieval

A good methodology for selecting features should be used

Adaboost is a learning algorithm

→ data dependent

It is important to have representative samples

Adaboost can help to improve the classification potential of simple algorithms

Page 48: Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005

48Feature Selection

Thank you !