31
Lecture Notes on Feature Selection Rossella Blatt [email protected] Department of Electronics and Information Politecnico di Milano Methods for Intelligent Systems Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.1

Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Lecture Notes on Feature Selection

Rossella [email protected]

Department of Electronics and InformationPolitecnico di Milano

Methods for Intelligent Systems

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.1

Page 2: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Dimensionality Reduction

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.2

DIMENSIONALITY REDUCTIONTo control of the dimensions of our pattern analysis problem and

to improve the classification accuracy

Feature Extraction

Project data in a lower dimensional space obtaining

new features

Feature Selection

Choose the best subset ofthe original features

PCASeeks the axes withmaximum variance

LDASeeks the axes with max

distance between class and minimum distance intreclass

Page 3: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Feature Selection• It concerns with the control of the dimensions of our

pattern analysis problem• The dimensions of the problem is mainly given by the

sample size and the feature set size• It would not have any sense to reduce the number of

examples because:• Usually we never have enough examples• We can assume that examples are correct

• On the contrary, we do not have any guarantee that all our features are needed for the classification

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.3

Page 4: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Feature Selection• Refers to algorithms that select (hopefully) the best subset of the initial

feature set• Selected features maintain their original physical interpretation (useful to

understand the physical process that generates patterns)• It leads to saving in measurement cost • In literature Feature Selection is also called “Feature Subset Selection”• Given a feature set X={xi, i=1,..,N}, find a subset YM={x1i, x12, …, xiM}

with M<N, that optimizes an objective function J(Y) (in some wayrelated to the probability of correct classification)

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.4

{ } { }[ ]NixJxxx

x

xx

x

xxx

iiM

iMii

iM

i

i

SelectionFeature

N

m

,...,1|maxarg,...,, ..

...,

212

1

3

2

1

==

⎥⎥⎥⎥

⎢⎢⎢⎢

⎯⎯⎯⎯⎯ →⎯

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

Page 5: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Feature Selection

1. Find a subset Find a search strategy to select candidate subset (the algorithm)

2. Optimizes an objective function Define a measure of the goodness of considered subsets (classification accuracy, interclass distance and so on)

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.5

Given a feature set X={xi, i=1,..,N}, find a subset YM={x1i, xi2, …, xiM} with M<N, that optimizes an objective function J(Y) (in some way related to the probability of correct classification)

TR

AIN

ING

D

AT

A Complete Feature Set

SEA

RCH

STRA

TEG

Y

OBJ

ECT

IVE

FU

NCT

ION

FIN

AL

F

EA

TU

RE

SU

BSE

T

FEATURE SUBSET SELECTION

Page 6: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Feature Selection: Example

• Goal: recognize oranges from mandarins…

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.6

• We collect 10 measures for 3 features:• Weight• Color Intensity• Diameter

Page 7: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

70 80 90 100 110 120 130 140 150 160 170-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1ORANGES VS MANDARINS: Feature Weight

Weight [grams]

OrangesMandarins

Feature Selection: Example• We obtain (first row:oranges, second row=mandarins):

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.7

⎥⎦

⎤⎢⎣

⎡=

7879919285937899858012413316412715315916099135120

Weight

Page 8: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

6.9 7 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1ORANGES VS MANDARINS: Feature Color Intensity

Color Intensity

OrangesMandarins

Feature Selection: Example• We obtain (first row:oranges, second row=mandarins):

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.8

⎥⎦

⎤⎢⎣

⎡=

0.780.790.710.720.750.730.780.790.750.780.720.780.770.750.720.780.690.710.750.7

IntensityColor

Page 9: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Feature Selection: Example• We obtain (first row:oranges, second row=mandarins):

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.9

⎥⎦

⎤⎢⎣

⎡=

8.17.97.16.75.77.36.877.581288.710.79.77.89.711.911.210.5

Diameter

5 6 7 8 9 10 11 12-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1ORANGES VS MANDARINS: Feature Diameter

Diameter [cm]

OrangesMandarins

Page 10: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

50

100

150

200

6.5

7

7.5

84

6

8

10

12

Weight [grams]

ORANGES VS MANDARINS: Features Plot

ColorIntensity

Dia

met

er [c

m]

OrangesMandarins

Feature Selection: Example• The feature space that we obtain is a 3 dimensional space

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.10

Page 11: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

50

100

150

200

6.5

7

7.5

84

6

8

10

12

Weight [grams]

ORANGES VS MANDARINS: Features Plot

ColorIntensity

Dia

met

er [c

m]

OrangesMandarins

Feature Selection: Example• We could perform our classification task in the obtained 3 dimensional feature

space, or……...

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.11

Page 12: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Feature Selection: Example• We could perform our classification task in the obtained 3 dimensional feature

space, or in a reduced 2-dimensional space or………………

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.12

6.9 7 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.95

6

7

8

9

10

11

12

Color Intensity

Dia

met

er

ORANGES VS MANDARINS: Features Plot - Color Intensity vs Diameter

OrangesMandarins

70 80 90 100 110 120 130 140 150 160 1705

6

7

8

9

10

11

12

Weight [grams]

Dia

met

er

ORANGES VS MANDARINS: Features Plot - Weight vs Diameter

OrangesMandarins

70 80 90 100 110 120 130 140 150 160 1706.9

7

7.1

7.2

7.3

7.4

7.5

7.6

7.7

7.8

7.9

Weight [grams]

Col

or In

tens

ity

ORANGES VS MANDARINS: Features Plot - Weight vs Diameter

OrangesMandarins

Page 13: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

6.9 7 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1ORANGES VS MANDARINS: Feature Color Intensity

Color Intensity

OrangesMandarins

Feature Selection: Example• We could perform our classification task in the obtained 3 dimensional feature

space, or in a reduced 2-dimensional space or even in a mono-dimensional space..

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.13

70 80 90 100 110 120 130 140 150 160 170-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1ORANGES VS MANDARINS: Feature Weight

Weight [grams]

OrangesMandarins

5 6 7 8 9 10 11 12-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1ORANGES VS MANDARINS: Feature Diameter

Diameter [cm]

OrangesMandarins

• Which one is the best for our classification purpose?!

Page 14: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Best Firsts: a naïve approach• One possible Feature Selection technique is to find the best

features individually and then to keep only the first M features.

• In our previous example the best features are (in decreasing order):

1. Weight (very good)2. Diameter (quite good)3. Color (very bad)

• If we want to keep only 2 features, ‘Weight’ and ‘Diameter’ will be the winner

• But…this approach fails quite always because it does not consider features with complementary information

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.14

Page 15: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Best Firsts: example• The figures show a 4 dimensional pattern recognition

problem (features are shown in pairs of 2D scatter plots)• Objective: find the best subset composed of M = 2 features• We rank the goodness of features:

1. x1 is the best feature because it is able to discriminate all clusters (except for ω4 and ω5)

2. x3 is the second best feature (it separates the space in the groups {ω1}, {ω2 - ω3}, {ω4 - ω5})

3. x2 is the third best feature (it is very similar to x3: it separates the space in the groups {ω1 – ω2}, {ω3}, {ω4 - ω5})

4. x4 is the worst feature because there is a lot of overlaps in its space, but it is the only one able to discriminate between ω4 and ω5

• The optimal feature subset (according to the Best FirstsFeature Selection approach) would be x1 and x2, not allowinga discrimination between ω4 and ω5!!

• The real best 2D subset is x1 and x4, as x4 provides the onlyinformation that x1 needs: the discrimination between ω4 and ω5!!

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.15

Page 16: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Objective Function

The objective function evaluatesfeature subsets by theirinformation content, typicallyintercalss distance, statisticaldependence or information theoretic measures

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.16

Objective Function

Filters Wrappers

The objective function is a pattern classifier, which evaluates feature

subsets by their predicitiveaccuracy (recognition rate on test

data) by statistical resampling or cross-validation

• We need to define:• A rule to analyse each possible subset Search Algorithm• A way to evaluate each subset Objective Function

Page 17: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Filter Objective Function• Distance between classes

• Euclidean• Mahalanobis• Determinant of SW

-1SB• …

• Correlation and information theoretic measures• This methods are based on the rationale that good feature subsets contain

features highly correlated with their class and highly uncorrelated withothers.

• Linear measure Correlation Coefficient

• Non Linear measure Mutual Information, that measures the amount bywhich the uncertainty in the class C is decreased by knowledge of the feature vector (it takes in consideration the entropy function)

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.17

∑ ∑

= +=

== M

i

M

ijij

M

iic

MYJ

1 1

1)(ρ

ρ where ρic is the correlation coefficient between feature i and the class label and ρij is the correlation coefficient betweenfeature i and feature j

Page 18: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Filters vs Wrappers Objective Function

Filters☺ Fast Execution

☺ They generally involve a non-iterative computation on the dataset

☺ Generality☺ They evaluates intrinsic properties

of the data (the solution will begood for a large family ofclassifiers)

Tendency to select large subsetSince the filter objective functionsare generally monotonic, thismethods tends to select the full feature set as the oprimal solutionThis forces the user to select a cutoff on the number of features to be selected

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.18

Wrappers☺ Accuracy

☺ Wrappers generally achieve betterrecognition rates since they are tuned to the specific interactionsbetween the classifier and the dataset

☺ Generalization☺ Using tecnhiques as cross validation

they can avoid overfittingSlow Execution

Because the wrapper must train a classifier for each feature subset (or several classifiers if cross-validat

Problem dependent

Page 19: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Search Strategies• There is a large number of search algorithms, which can be grouped in 3 categories:

• Exponential Algorithm• These algorithms evaluate a number of subset that grows expoentially with the

dimensionality of the search space• Exhaustive search• Branch & Bound• Approximate Monoticity with Branch & Bound

• Sequential Algorithms• These algorithms add and remove features sequentally

• Sequential Forward Selection• Sequential Backward Selection• Bidirectional Selection• Plus-L Minus-R Selection• Sequential Floating Selection

• Randomized Algorithms• These algorithms incorporate randomness into their search procedure to escape local

minima• Random Generation plus Sequential Selection• Simulated Annealing• Genetic Algorithms

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.19

Page 20: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Exponential Search: Exhaustive search• A very naive approach• We consider all possible combinations of features• This number of combination is unfeasible, even for moderate value

of M and N • It is guaranteed to find the optimal subset• In our previous examples (oranges vs mandarins) of 3 features, the

total number of possible subsets was equal to:

⎟⎟⎠

⎞⎜⎜⎝

⎛MN

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.20

( ) ( ) ( ) ( ) ( ) ( ) ( ) 713366

26

26

1123123

112123

121123

!33!3!3

!23!2!3

!13!1!3

!!!

=++=++=⋅⋅⋅⋅⋅

+⋅⋅⋅⋅

+⋅⋅⋅⋅

=−

+−

+−

=−

=⎟⎟⎠

⎞⎜⎜⎝

⎛MNM

NMN

• That is, we obtain 3 subsets of 1 feature, 3 subsets of 2 features and 1 subset of 3 features

• With a more realistic number of features (let’s say 10), we wouldobtain 1023 possible subsets!!

Page 21: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Exponential Search: Branch & Bound

• It uses the well known Branch & Bound search method: only a fraction of all possible feature subsets need to be enumerated tofind the optimal subset

• It is based on the monoticity assumption: “the addition of features can only increase the value of the objective function”:

• Branch & Bound starts from the full set and removes featuresusinga depth-first strategy

• Nodes whose objective function is lower than the current best are notexplored, since the monoticity assumption ensures that their childrenwill not contain a better solution

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.21

( ) ( ) ( ) ( )Niiiiiiiiii xxxxJxxxJxxJxJ ,...,,,...,,,

321321211<<<<

Page 22: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Exponential Search: Branch & Bound

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.22

Page 23: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Exponential Search: Approximate Monoticitywith Branch & Bound (AMB&B)

• AMB&B is a variation of the classic Branch & Boundalgorithm

• It allows non-monotonic functions to be used by relaxingthe cutoff condition that terminates the search on a specificnode

• For example we can replace the limit of number offeatures with a threshold error rate.

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.23

Page 24: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Sequential Search: Sequential Forward Selection (SFS)

• It is the simplest greedy search algorithm• Select the best single feature and then add one feature at a time;

the added feature is those one that maximizes the objective function in combination with the previous feature set.

• Starting from the empty set, sequentially add the feature x+ thatresults in the highest objective function J(Yk+x+) when combinedwith the features Yk that have already been selected

• It performs best when the optimal subset has a small number offeatures

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.24

• Once a feature is retainedit cannot be discardedanymore

• Suboptimal solution!

Page 25: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Sequential Search: Sequential Backward Selection (SBS)

• Similar to SFS, but it works in the opposite direction• Starting from the full feature set, sequentially delete the

feature x- that results in the smallest decrease of the objective function J(Yk-x-)

• It performs best when the optimal subset has a largenumber of features

• Once a feature is deleted it cannot be brought back anymore

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.25

• Suboptimal solution!

Page 26: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Sequential Search: Bidirectional Search (BDS)

• In order to guarantee that SFS and SBS converge to the samesolution we must ensure that:• Feature already selected by

SFS are not removed bySBS

• Feature already deleted bySBS are not selected by SFS

• Suboptimal solution!

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.26

• BDS is a parallel implementation of SFS and SBS:• SFS is performed from the empty set• SBS is performed from the full set

Page 27: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Sequential Search: Plus-L Minus-R Selection (LRS)

• If L<R, LRS starts from the full set and repeatedly removes ‘R’ features followed by ‘L’ feature additions

• LRS attempts to compensate for the weaknesses of SFS and SBS withsome backtracking capabilities

• Its main limitation is the lack of a theory to help predict the optimalvalues of L and R

• Suboptimal solution!

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.27

• Plus-L Minus-R Selection is a generalization of SFS and SBS• If L>R, LRS starts from the empty set and repeatedly adds ‘L’ features

and removes ‘R’ features n

Page 28: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Sequential Search: Sequential FloatingSelection (SFFS and SFBS)

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.28

• Sequential Floating Selection methods are an extension to the LRS algorithms with flexiblevalues for L and R

• These values are determinedautomatically from the data and updated dynamically

• It goes very close to the optimal solution, but at anaffordable computationalcost

• No guarantees to reach the optimal solution!

Page 29: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Randomized Search: Random Generation + Sequential Selection

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.29

• RGSS introduce randomness into SFS and SBS• In this way it avoids to fall into local minima• We consider a number of random combination of features and

select the best one

• Obviously there is no guarantee to find the best solution

1. Repeat for a number of iterations1a. Generate a random feature subset1b. Perform SFS on this subset1c. Perform SBS on this subset

2. Chose the best subset

Page 30: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Randomized Search: Genetic Algorithms

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.30

• Genetic Algorithms are optimization techniques that mimicthe evolutionary process of survival of the best

• They explore all the solution space and find the solution thatmaximizes the objective function

• The choice of parameters is not so simple...

• We will see more about genetic Algorithms during nextlesson!

Page 31: Lecture Notes on Feature Selection - Politecnico di Milanohome.deib.polimi.it/matteucc/lectures/MIS/handout-e09.pdf · Feature Selection 1. Find a subset ÆFind a search strategyto

Feature Selection Search Strategies: SummaryFeature Selection 

Method Method Accuracy Complexity

ExponentialAlgorithms Exponential

Quadratic O(NEX

2)

Generally low

SequentialAlgorithms

RandomizedAlgorithms

Notes

Exhaustive Search

Branch & Bound

ApproximateMonoticity withBranch & Bound

Always find the optimal solution

(B&B under the monoticityassumption)

High accuracy buthigh complexity

Sequential ForwardSelection (SFS)Sequential BackwardSelection (SBS)Plus‐L Minus‐RSelection

No guarantees tofind the optimal

solutionSimple and fast, but not optimal

Genetic Algorithms

BidirectionalSelectionSequential FloatingSelectionRandom Generation plus SequentialSelection Usually it finds the 

optimal solution

Escape localminima, difficultto choose goodparameters

Methods for Intelligent Systems, AA 2006-2007 Feature Selection, pp.31