50
Analysis of Feature Selection Algorithms Branch and Bound | Beam Search algorithm Parinda Rajapaksha UCSC 1

Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

Embed Size (px)

DESCRIPTION

Branch & Bound and Beam search algorithms were illustrated according to the feature selection domain. Presentation is structured as follows, - Motivation - Introduction - Analysis - Algorithm - Pseudo Code - Illustration of examples - Applications - Observations and Recommendations - Comparison between two algorithms - References

Citation preview

Page 1: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

1

Analysis of Feature Selection Algorithms

Branch and Bound | Beam Search algorithm

Parinda Rajapaksha UCSC

Page 2: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

2

ROAD MAP

• Motivation• Introduction • Analysis– Algorithm– Pseudo Code– Illustration of examples

• Applications• Observations and Recommendations• Comparison between two algorithms• References

Page 3: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

3

SECTION 1

Branch and Bound Algorithm

Page 4: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

4

MOTIVATION

• The optimal feature selection (subset selection) has been very difficult because of its computational complexity

• All the subsets of given cardinality that have to be evaluated to find the optimal set of features among large set of measurements

• Exhaustive search is impractical even for relatively small size of problems– Finding 2 features from 10 feature set would generate 45 possible

combinations.

• This challenge has motivated over the years to speeding up the search process in the arena of feature selection

Page 5: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

5

• As a solution Branch and Bound algorithm was developed by Narendra and Fukunaga in 1977

• Introduced heuristic measures which can help to identify parts of the search space that can be left unexplored without missing the optimal solution

• Guaranteed to find the optimal feature subset without evaluating all possible subsets

• B & B is an exponential search method

• Assumes that feature selection criterion is monotonic

INTRODUCTION

Page 6: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

6

• For given two feature subsets (X ,Y) and feature selection criterion function (J);

• It ensures that the values of the leaf nodes of that branch cannot be better than the current bound

• Allows to create short cuts in the search tree representing the feature set optimization process

• Reduce the number of nodes and branches of the search tree that have to be explored

INTRODUCTION Monotonicity Property

X Y J(X) < J(Y)⊂ Ex: Y = {2,4,5} X = {2,4}

Page 7: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

7

Feature set => {x1, x2, x3, x4,……..xn }

J(x1) < J(x1 ,x2) < J(x1, x2, x3) < ……..< J(x1, x2, x3, …. xn)

INTRODUCTION Monotonicity Property

Xn

X2

x1{x1}

{x1, x2, x3… xn}

{x1, x2}

J(x1..n)

Page 8: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

8

• Start from the full set of features and remove features using depth first strategy

• Monotonicity property should be satisfied to apply the algorithm

• Branching is the constructing process of tree

• For each tree level, a limited number of sub-trees is generated by deleting one feature from the set of features from the parent node

• Bounding is the process of finding optimal feature set by traversing the constructed tree

ANALYSIS

Page 9: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

9

1. Construct an ordered tree by satisfying the Monotonicity property

Let xj be the set of features obtained by removing j features y1 , y2 … yj from the set Y of all features

Xj = Y \ {y1 , y2 … yj }

The monotonicity condition assumes that, for feature subsets x1 , x2 … xj where,

x1 x⊂ 2 ⊂ x3 …. x⊂ j

The criterion function J fulfills, J(x1) < J(x2) < J(x3) < … < J(xj)

ANALYSIS Algorithm

Page 10: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

10

2. Traverse the tree from right to left in depth first search pattern

• If the value of the criterion is less than the threshold (relevant to the most recent best subset) at a given node,

All its successors will also have a value less than criterion value

3. Pruning• Anytime the criterion value J(xm) in some internal node is found

to be lower than the current bound, due to the Monotonicity condition the whole sub tree may be cut off and many computations may be omitted

• B&B creates tree with all possible combinations of r element subsets from the n set, but searches only some of them

ANALYSIS Algorithm

Page 11: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

11

ANALYSIS Pseudo Code

Page 12: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

12

Root of the tree represents the set of all features (n) and leaves represent target subsets (r) of features

For each tree level, a limited number of sub-trees is generated by deleting one feature from the set of features from the parent node

ANALYSIS Tree Properties

{ X1,X2,X3 }

{ X2,X3 } { X1,X3 } { X1,X2 }

X1 X2 X3

All features (n)

Target subset (r)

Removed feature

Page 13: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

13

In practice, we have only allowed variables to be removed in increasing order. This removes unnecessary repetitions in the calculation. Therefore tree is not symmetrical

ANALYSIS Tree Properties

{ X1,X2,X3,X4 }

{ X2,X3,X4 } { X1,X3,X4 } Not in increasing order

X1 X2

{ X3,X4 }

X2

{ X3,X4 }

X1

Repetition

Page 14: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

14

Number of leaf nodes in tree = nCr

Number of levels = n – r

Ex:

No of leaf nodes = 3C2 = 3

No of levels = 3 – 2 = 1

ANALYSIS Tree Properties

{ X1,X2,X3 }

{ X2,X3 } { X1,X3 } { X1,X2 }

X1 X2 X3

3 features reduced to 2 features

Page 15: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

15

How to reduce 5 features in to 2 features using B & B Algorithm?

Finding best 2 features from full set of features

EXAMPLE

1, 2, 3, 4, 5 ? , ?

Page 16: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

16

Identify the Tree properties

- No of levels = 5-2 = 3 (5 4 3 2)

- No of leaf nodes = 5C2 = 10

- Choose a criterion function J(x).

EXAMPLE Branching Step 1

Page 17: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

17

EXAMPLE Branching Step 2

1,2,3,4,5

2,3,4,5 1,3,4,5 1,2,4,5

3

L 0

21

L 1

Note : If feature 4 and 5 remove from initial states, tree does not become a complete tree. There will be no features to remove in the next levels.

Page 18: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

18

EXAMPLE Branching Step 3

1,2,3,4,5

2,3,4,5 1,3,4,5 1,2,4,5

3

L 0

21

3,4,5 2,4,5 2,3,5 1,4,5 1,3,5 1,2,5

2 3 4 3 4 4

L 1

L 2

Page 19: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

19

EXAMPLE Branching Step 4

1,2,3,4,5

2,3,4,5 1,3,4,5 1,2,4,5

3

L 0

21

3,4,5 2,4,5 2,3,5 1,4,5 1,3,5 1,2,5

4,5 3,5 3,4 2,5 2,4 2,3 1,5 1,4 1,3 1,2

2 3 4 3 4 4

3 4 5 4 5 5 4 5 5 5

L 1

L 2

L 3

Page 20: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

20

EXAMPLE Criterion Values

15

10 12 11

6 7 8 8 10 9

3 4 5 5 6 7 6 7 9 8

• Assume the Criterion function J(X) will give following results which satisfied the Monotonicity Property

Criterion values

Page 21: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

21

EXAMPLE Backtracking

15

11

9

8

• Calculate the criterion values using J(x) function (values are Assumed)• Set the right most value as the Bound (this branch has the minimum

number of child nodes and edges)

Set Bound

Current V = 8 Bound = 8 10 12

6 7 8 8 10

3 4 5 5 6 7 6 7 9

Page 22: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

22

EXAMPLE Backtracking

15

10 12 11

6 7 8 8 10 9

3 4 5 5 6 7 6 7 9 8

• Backtrack along the tree (depth search) ifCurrent Node Value ≥ Bound

• Update the bound when backtracking reach to a leaf node

Update Bound

Current V = 9 Bound = 9

Page 23: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

23

EXAMPLE Backtracking

15

10 12 11

6 7 8 8 10 9

3 4 5 5 6 7 6 7 9 8

• If Current Node Value ≤ BoundDiscard the below branches (Prune)

• Bound will not change

X

Current V = 8 Bound = 9

Page 24: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

24

EXAMPLE Backtracking

15

10 12 11

6 7 8 8 10 9

3 4 5 5 6 7 6 7 9 8

• Repeat the previous steps

XX

Current V = 8 Bound = 9

Page 25: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

25

EXAMPLE Backtracking

15

10 12 11

6 7 8 8 10 9

3 4 5 5 6 7 6 7 9 8

• Maximum bound in leaf nodes = 9• Optimal feature subset = {1,3}• Note that the some subsets in L3 can be omitted without calculating

XX

Current V = 6 Bound = 9

XX

{1,3}

Page 26: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

26

Reduce 10 features to 6 features

No of levels = 10 - 6 = 4

No of leaf nodes = 10C6 = 210

EXAMPLE 2

1, 2, 3, 4, 5,6,7,8,9,

10

? ?,?,?,??

n = 10 r = 6

Page 27: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

27

1 2 3 4 5 6 7

2 . . . . 8 8

3 . . . . 9 9

4 . . . . 10 10

210 Leaf nodes

4 Levels

EXAMPLE 2 Reduce 10 features to 6

Page 28: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

28

Evaluation of Feature Selection Techniques for Analysis of Functional MRI and EEG

– This paper compares the performance of classical sequential methods and the B&B algorithm when applied to functional Magnetic Resonance Images (MRI) and intracranial EEG to classify pathological events.

– They have used 12 features for MRI and 14 features for EEG data

– The results of this work contradict that claim in several sources that the B&B algorithm is an optimal search algorithm for feature selection

APPLICATIONS B & B

Page 29: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

29

– The algorithm fails to create subsets with better classification accuracy in this application

APPLICATIONS B & B

Classification accuracy as a function of subset size for the MRI data

Page 30: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

30

• Every B & B algorithm requires additional computations– Not only the target subsets of r features, but also their supersets n

have to be evaluated

• Does not guarantee that enough sub-trees will be cut off to keep the total number of criterion computations lower than their number in exhaustive search

• In the worst case, Criterion function would be computed in every tree node– Same as the Exhaustive search

OBSERVATIONS & RECOMMENDATION

Page 31: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

31

• Criterion value computation is usually slower near to the root– Evaluated feature subsets are larger J(X1,X2…Xn)

• Sub tree cut-offs are less frequent near to the root– Higher criterion values following from larger subsets are

compared to the bound, which is updated in leaves

• The B & B algorithm usually spends most of the time by tedious, less promising evaluation of the tree nodes in levels closer to the root

• This effect is to be expected, especially for r <<< n

OBSERVATIONS & RECOMMENDATION

Page 32: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

32

SECTION 2

Beam Search Algorithm

Page 33: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

33

• Beam search is a heuristic method for solving combinatorial optimization problems

• It is similar to breadth-first search as it progresses level by level

• Only the most promising nodes at each level of the search tree are selected for further branching, while the remaining nodes are pruned off permanently

• Beam search was first used in the artificial intelligence community for the speech recognition and the image understanding problems

• The running time is polynomial in the problem size

INTRODUCTION

Page 34: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

34

a) Compute the classifier performance using each of the n features individually (n 1-tuples)

b) Select the best K (beam-width) features based on a pre-defined selection criterion among these 1-tuples

c) Add a new feature to each of these K features, forming K(n−1) 2-tuples of features. The tuple-size t is equal to 2 at this stage

d) Evaluate the performance of each of these t-tuples. Of these, select the best K, based on classification performance.

ANALYSIS Algorithm

Page 35: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

35

e) Form all possible (t + 1) tuples by appending these K t-tuples with other features (not already in that t tuple)

f) Repeat steps d) to e) until the stopping criterion is met. the tuple size at this stage is m.

g) The best K m-tuples are the result of beam search.

ANALYSIS Algorithm

Page 36: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

36

ANALYSIS Pseudo Code

Page 37: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

37

1. Stat with the empty set (With no features) and evaluate the values of each feature individually – Values can be calculated by using criterion function or evaluating

classifier performance2. Choose a value for Beam width (K)

– K define the number of subsets to be carried for the next level3. Carry the best K subsets to the next level

– Cut off value can be checked before the selection of best subsets4. Add a new feature (previously not used) to each of these selected feature

subsets 5. Repeat the process until tree reach to the target subset

– Or a stopping criteria can be defined to terminate the process

ANALYSIS Easy 5 Steps

Page 38: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

38

How to reduce 5 features in to 3 features using Beam Search Algorithm?

Finding best 3 features from full set of features

EXAMPLE

{1, 2, 3, 4, 5 } { ?,?,? }

Page 39: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

39

• Start with the empty subset and evaluate the values of each feature individually (Assume the values as follows)

EXAMPLE Step 1

{ }

2530 14 28 16

1 2 3 4 5

{1} {2} {3} {4} {5}

Page 40: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

40

• Select the best K (beam width) features based on a pre-defined selection criterion. (Assume K =3 )

EXAMPLE Step 2

{ }

2530 14 28 16

1 2 3 4 5

Best features { 1 } { 3 } { 5 }

{1} {2} {3} {4} {5}

Page 41: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

41

• Add a new feature to each of these selected features. Order is not important. Duplications cannot be avoided.

EXAMPLE Step 3

{ }

2530 14 28 16

1 2 3 4 5

31 35 60 39 30 55 40 50 34 35 34 48

2 3 4 5 1 2 4 5 1 2 3 4

Best features { 1 } { 3 }{ 5 }

Page 42: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

42

• Choose the best K performance subsets among new feature sets.

EXAMPLE Step 4

{ }

14 16

1 2 3 4 5

31 35 60 39 30 55 40 50 34 35 34 48

2 3 4 5 1 2 4 5 1 2 3 4

Best features { 1,4 }{ 3,2 } { 3,5 }

2530 28

{1,4} {3,2} {3,5}

Page 43: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

43

• Carry the best K performance subsets to next level by adding rest of the available features.

EXAMPLE Step 5

{ }

14 16

1 2 3 4 5

31 35 60 39 30 55 40 50 34 35 34 48

45 40 70 56 58 88 67 62 75

2 3 4 5 1 2 4 5 1 2 3 4

2 3 5 1 4 5 1 2 4

Best features { 1,4 }{ 3,2 } { 3,5 }

2530 28

Page 44: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

44

• Tree has reached to the 3 features which is the target subset. Maximum value will give the best feature set.

EXAMPLE Step 6

{ }

14 16

1 2 3 4 5

31 35 39 30 40 34 35 34 48

45 40 70 56 58 88 67 62 75

2 3 4 5 1 2 4 5 1 2 3 4

2 3 5 1 4 5 1 2 4

Best features { 1,4,5 }{ 3,2,5 } { 3,5,4 }

60 55 50

2530 28

{1,4,5} {3,2,5} {3,5,4}

Page 45: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

45

Beam Search for Feature Selection in Automatic SVM Defect Classification

– In this paper they have used have implemented beam search with a support vector machine (SVM) classifier to select the candidate subsets

– Improvements have been proposed to the beam search algorithm for feature selection, and the modified version is called Smart Beam Search (SBS)

– Each defect in the data set is described by a high dimensional feature vector consisting of about 100 features

APPLICATIONS Beam Search

Page 46: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

46

– The data set is comprised of about 3000 images, with 13 defect classes and presented the results for beam widths K= 2 and 5

– SBS feature selection approach has reduces the dimensionality of the feature space and increased the classifier performance

APPLICATIONS Beam Search

Overall accuracy using features selected by SBS

Page 47: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

47

• There is no backtracking, since the intent of this technique is to search quickly

• Therefore, beam search methods are not guaranteed to find an optimal solution and cannot recover from wrong decisions

• Duplications cannot be avoided in the tree• If a node leading to the optimal solution is discarded during the

search process, there is no way to reach that optimal solution afterwards

• Beam width parameter K is fixed to a value before searching starts

• A wider beam width allows greater safety, but it will increase the computational cost

OBSERVATIONS & RECOMMENDATIONS

Page 48: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

48

COMPARISON

Branch and Bound Beam Search

Follow depth fast strategy Similar to breadth fast search

Guaranteed to find the optimal feature subset

Not guaranteed to find optimal feature subset

It is an Exponential search Polynomial running time in the problem size

Backtracking needed to prune unnecessary subsets No need of backtracking process

Need additional computations to backtrack after constructing the tree

No additional computations needed after constructing the tree

Need to fulfill Monotonicity Property No need to consider about Monotonicity Property

Duplicate subsets are omitted Duplications cannot be avoided

Page 49: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

49

REFERENCES

1. Narendra, P. M., & Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. Computers, IEEE Transactions on, 100(9), 917-922.

2. Somol, P., Pudil, P., & Kittler, J. (2004). Fast branch & bound algorithms for optimal feature selection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 26(7), 900-912.

3. Burrell, L., Smart, O., Georgoulas, G. K., Marsh, E., & Vachtsevanos, G. J. (2007, June). Evaluation of Feature Selection Techniques for Analysis of Functional MRI and EEG. In DMIN (pp. 256-262)

4. Gupta, P., Doermann, D., & DeMenthon, D. (2002). Beam search for feature selection in automatic SVM defect classification. In Pattern Recognition, 2002. Proceedings. 16th International Conference on (Vol. 2, pp. 212-215). IEEE.

Page 50: Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

50

REFERENCES

5. Dashti, M. T., & Wijs, A. J. (2007). Pruning state spaces with extended beam search. In Automated Technology for Verification and Analysis (pp. 543-552). Springer Berlin Heidelberg.

6. Valente, J., & Alves, R. A. (2005). Filtered and recovering beam search algorithms for the early/tardy scheduling problem with no idle time. Computers & Industrial Engineering, 48(2), 363-375.