FD FS High Dimensional Data

8/11/2019 FD FS High Dimensional Data

1/61

FEATURE DISCRETIZATION AND SELECTION

TECHNIQUES FOR HIGH-DIMENSIONALDATA

Artur J. Ferreira

Supervisor, Prof. Mario A. T. Figueiredo

Instituto Superior de Engenharia de LisboaInstituto de Telecomunicacoes, Lisboa

Priberam Machine Learning Lunch Seminar17 April 2012


2/61

Introduction Background FD Proposals FS Proposals Conclusions Some Resources

Outline

Introduction

BackgroundHigh-Dimensional DataFeature Discretization (FD)Feature Selection (FS)

FD ProposalsStatic Unsupervised ProposalsStatic Supervised Proposal

FS ProposalsFeature Selection Proposals

Conclusions

Some Resources

2 / 4 5


3/61


Introduction and MotivationSome well-known facts about machine learning:

1 An adequate (sometimes discrete) representation of the datais necessary

2 High-dimensional datasets are increasingly common

3 Learning in high-dimensional data is challenging

4 Sometimes, the curse-of-dimensionalityproblem arises small number of instances n and large dimensionality d (e. g.,

micro-array data or text categorization data) it must be addressed in order to have effective learning

algorithms

5 Feature discretization (FD) and feature selection (FS)techniques address these problems

achieve adequate representations select an adequate subset of features with a convenient

representation3 / 4 5


4/61








algorithms





5/61








algorithms




S C S


6/61








algorithms




I d i B k d FD P l FS P l C l i S R


7/61








algorithms




I t d ti B k d FD P l FS P l C l i S R


8/61








algorithms






9/61








algorithms



representation

3 / 4 5



10/61








algorithms



representation

3 / 4 5



11/61








algorithms



representation

3 / 4 5



12/61


High-Dimensional DataSome high-dimensional datasets available on the WEB with

different types of problemsDatasets with cclasses and n instances shown by increasing

dimensionality d; notice that in some cases dn

Dataset d c n Type of Data

Colon 2000 2 62 MicroarraySRBCT 2309 4 83 MicroarrayAR10P 2400 10 130 FacePIE10P 2420 10 210 FaceTOX-171 5748 4 171 MicroarrayExample1 9947 2 50 Text, BoWORL10P 10304 10 100 Face11-Tumors 12553 11 174 MicroarrayLung-Cancer 12601 5 203 MicroarraySMK-CAN-187 19993 2 187 MicroarrayDexter 20000 2 2600 BoWGLI-85 22283 2 85 MicroarrayDorothea 1000000 2 1950 Drug Discovery

4 / 4 5



13/61

g p p

High-Dimensional DataThese datasets are available on-line:

The well-known university of California at Irvine(UCI)repository, archive.ics.uci.edu/ml/datasets.html

The gene expression model selector(GEMS) project, atwww.gems-system.org/

5 / 4 5

http://localhost/var/www/apps/conversion/tmp/scratch_5/archive.ics.uci.edu/ml/datasets.htmlhttp://www.gems-system.org/http://www.gems-system.org/http://localhost/var/www/apps/conversion/tmp/scratch_5/archive.ics.uci.edu/ml/datasets.html


14/61

High-Dimensional DataThe recently developed Arizona state university(ASU) repository

featureselection.asu.edu/datasets.php hashigh-dimensional datasets

6 / 4 5

http://localhost/var/www/apps/conversion/tmp/scratch_5/featureselection.asu.edu/datasets.phphttp://localhost/var/www/apps/conversion/tmp/scratch_5/featureselection.asu.edu/datasets.php


15/61

Feature Discretization

Feature Discretization (FD) aims at: representing a feature with a set of symbols from a finite set

keeping enough information for the learning task

ignoring minor (noisy/irrelevant) fluctuations on the data

FD can be performed by:

unsupervisedorsupervisedmethods; the latter uses classlabels to compute the discretization intervals

staticordynamicmethods static - the discretization intervals are computed by using

solely the training data dynamic - rely on a wrapper approach with a quantizer and a

classifier

7 / 4 5



16/61

Feature Discretization: Static Unsupervised Approaches

Three static unsupervised techniques are commonly used for FD: equal-interval binning(EIB) - uniform quantization

equal-frequency binning(EFB) - non-uniform quantization toattain uniform distribution

proportional k-interval discretization (PkID) - the number andsize of discretized intervals are adjusted to the number oftraining instances

As compared to both EIB and EFB, PkID provides nave-Bayes

classifiers with:

competitive classification performance for smaller datasets

better classification performance for larger datasets

8 / 4 5



17/61

Feature Discretization: Static Supervised Approaches

Some static supervised techniques for FD:

information entropy maximization (IEM), Fayyad and Irani,1993

minimal description length (MDL), Kononenko, 1995

class-attribute interdependence maximization (CAIM), 2004

class-attribute contingency coefficient(CACC), 2008

correlation maximization (CM), 2011

Empirical evidence shows that:

the IEM and MDL methods have good performance regardingaccuracy and running-time.

the CACC and CAIM methods have higher running-time thanboth IEM and MDL. In some cases, they attain better resultsthan these methods

9 / 4 5



18/61

Feature Discretization: Wekas Environment

We can find IEM (Fayyad and Irani, 1993) and MDL (Kononenko,

1995) in the Wekas machine learning packageweka.filters.supervised.attribute.Discretize

Note: the unsupervised PkID method is also available.

10/45



19/61

Feature Selection

Feature Selection (FS) is a central problem in Machine Learningand Pattern Recognition

what is the best subset of features for a given problem?

how many features should we choose?

There are many recent papers published in 2012, regarding FS

FS can be performed by:

unsupervisedorsupervisedmethods; the latter uses classlabels

filter, wrapper, orembeddedapproaches

11/45



20/61

Feature Selection: Filter Approach

The FS filter approach:

assesses the quality of a given feature subset using solelycharacteristics of that subset

it does not rely on the use of any learning algorithm

There are many (successful) filter approaches for FS:

unsupervised Term-Variance (TV), Laplacian Score (LS),Laplacian Score Extended (LSE), SPEC, ...

supervised Relief, ReliefF, CFS, FiR, ...

supervised FCBF, mrMR, MIM, CMIM, IG, ...

12/45



21/61

Feature Selection: Information-Theoretic FiltersClaude Shannons information theory(IT) has been the basis for

the proposal of many filter methods (IT-based filters):

Brown, G., Pocock, A., Zhao, M., Lujan, M., 2012. Conditional likelihood

maximisation: A unifying framework for information theoretic feature

selection. Journal of Machine Learning Research 13, 2766. 13/45



22/61

Feature Selection: Wrapper ApproachWrapper approaches

uses some method to search the space of all possible subsetsof features

assess the quality by learning and evaluating a classifier withthat feature subset

Combinatorial search makes wrappers inadequate forhigh-dimensional data

Some recent wrapper approaches for FS (2009-2011):

Greedy randomized adaptive search procedure(GRASP)

A GRASP-based FS hybrid filter-wrapper method A sparse model for high-dimensional data, based on linear

programming

Estimation of the redundancy between feature sets, by theconditional MI between feature subsets and each class

14/45



23/61

Feature Selection: Embedded Approach

The embedded (or integrated) approach:

simultaneously learns the classifier and chooses a subset offeatures

assigns weights to features

the objective function encourages some of these weights to

become zero

Examples of the embedded approach are:

sparse multinomial logistic regression (SMLR*)

sparse logistic regression (SLogReg) method

Bayesian logistic regression (BLogReg), a more adaptiveversion of SlogReg

joint classifier and feature optimization (JCFO*), whichapplies FS inside a kernel function

15/45



24/61

Feature Selection on High-Dimensional Data

How difficult/challenging is FS on a given dataset?

The ratio1

RFS= n

ac,

withn patterns (instances), cclasses, and a being the median arityof the features, discretized with EFB, measures this difficulty. Lowvalues imply more difficult FS problems.

Another question:Shouldnt the d/n ratio also be taken into account?

1Brown, G., Pocock, A., Zhao, M., Lujan, M., 2012. Conditional likelihoodmaximisation: A unifying framework for information theoretic feature

selection. Journal of Machine Learning Research 13, 2766.16/45



25/61

Feature Selection: an (outdated) categorization

Liu, H. and Yu, L., Toward Integrating Feature Selection Algorithms for

Classification and Clustering, IEEE Transactions on Knowledge and Data

Engineering, vol. 17, n. 4, April 2005, pp 491502 17/45



26/61

Feature Discretization: Static U-LBG1 AlgorithmRecently, we have proposed the use of the Linde-Buzo-Gray

algorithm for unsupervised static FS2

The LBG algorithm is applied individually to each featureleading to discrete features with minimum mean square error(MSE)

Rationale: low MSE(Xi, Q(Xi)) is adequate for learning! It is stopped when:

the MSE distortion falls below some threshold or the maximum number of bits qper feature is reached

obtains a variable number of bits per feature

uses (, q) as input parameters; to 5% of the range ofeach feature and q {4, . . . , 10} are adequate

2A. Ferreira and M. Figueiredo, An unsupervised approach to featurediscretization and selection, Elsevier - Pattern Recognition Journal, DOI:

10.1016/j.patcog.2011.12.00818/45



27/61

Feature Discretization: U-LBG2 Algorithm

Similar to U-LBG1 (both aim at obtaining quantizers thatrepresent the features with a small distortion) with the following

key differences: each discretized feature will be given the same (maximum)

number of bits q;

only one quantizer is learned for each feature

19/45



28/61

Experimental Results: Unsupervised DiscretizationDiscretization performance:

total number of bits per pattern (T. Bits) and Test Set ErrorRate (Err, average of ten runs)

using up to q= 7 bits, using the nave Bayes classifier

EFB U-LBG1 U-LBG2Dataset Original Err T. Bits Err T. Bits Err T. Bits Err

Phoneme 21.30 30 22.30 9 22.80 30 20.60Pima 25.30 48 25.20 30 25.20 48 25.80Abalone 28.00 48 27.60 15 27.20 48 27.70Contraceptive 34.80 54 31.40 15 38.00 54 34.80Wine 3.73 78 4.80 27 3.20 78 3.20

Hepatitis 20.50 95 21.50 32 21.00 39 18.00WBCD 5.87 180 5.13 60 5.87 180 5.67Ionosphere 10.60 198 9.80 49 17.40 198 11.00SpamBase 15.27 324 13.40 54 15.73 324 15.67Lung 35.00 318 35.00 74 35.83 318 35.00Arrhythmia 32.00 1392 51.56 553 30.22 1392 41.56

20/45



29/61

Some insight on the accuracy of each feature

Test set error rate of nave Bayes using only a single feature, discretizedwithq= 4 bit by the U-LBG2 algorithm, on the WBCD dataset.

Horizontal dashed line is the test set error rate of the p=30 features. 21/45



30/61

Some insight on the accuracy of each feature

Test set error rate of nave Bayes using only a single feature, discretizedwithq= 8 bit by the U-LBG2 algorithm, on the WBCD dataset.

Horizontal dashed line is the test set error rate of the p=30 features. 22/45



31/61

Some insight on progressive discretization 1/2

Test set error rate of nave Bayes using solely feature 17 of the WBCD

dataset (original feature and U-LBG2 discretized with q {1, . . . , 10}).

The discrete versions do not provide higher accuracy! 23/45



32/61

Some insight on progressive discretization 2/2

Test set error rate of nave Bayes using solely feature 25 of the WBCD

dataset (original feature and U-LBG2 discretized with q {1, . . . , 10}).

With q= 10, we get a small improvement on the accuracy. 24/45



33/61

Experimental Results Analysis

Somme comments on these methods and results: Often, discretization improves classification accuracy

EFB usually attains better results than EIB

U-LBG2 is usually faster than U-LBG1, but allocates more

bits per feature Often, one of the LBG methods attains better results than

EFB

The U-LBG procedures are more complex than either EIB or

EFB PkID attains results close to EFB, when using the nave Bayes

classifier

25/45



34/61

Feature Discretization: S-LBG Algorithm (ideas)

Supervised version of the U-LBG counterparts:

each feature is discretized with LBG

it uses the mutual information (MI) between discretizedfeatures and the class label, in order to control the

discretization procedure it stops at bbits or when the relative increase of MI with

respect to the previous quantizer is less than a given threshold

each feature is discretized with an increasing number of bits,

stopping only when: there is no significant increase on the relevance of the feature the maximum number of bits is reached

26/45



35/61

Feature Discretization: MI Algorithm (ideas)

Ongoing work with the following key ideas:

discretize each feature to maximize MI with the class label

do this in a progressive approach, scanning all features allocate a variable number of bits per feature

check how the relevance of each feature changes whendiscretized with b+ 1 bits, as compared to bbits

27/45


36/61



37/61

Experimental ResultsDiscretization performance:

total number of bits per pattern (TBits) and Test Set ErrorRate (Err, average of ten runs)

using up to q= 7 bits, using the nave Bayes classifier

EFB U-LBG1 U-LBG2 S-LBGDataset Base TBits Err TBits Err TBits Err TBits Err

Iris 2.6 24 4.5 7 9.0 24 2.6 15 3.4Phoneme 21.3 30 22.3 9 22.8 30 20.6 20 22.5Pima 25.3 48 25.2 30 25.2 48 25.8 40 24.4Abalone 28.0 48 27.6 15 27.2 48 27.7 35 27.8Contrac. 34.8 54 31.4 15 38.0 54 34.8 29 34.7Wine 3.7 78 4.8 27 3.2 78 3.2 57 4.5

Hepatitis 20.5 95 21.5 32 21.0 39 18.0 29 19.0WBCD 5.8 180 5.1 60 5.8 180 5.6 116 5.4Ionosph. 10.6 198 9.8 49 17.4 198 11.00 177 11.0SpamBase 15.2 324 13.4 54 15.7 324 15.6 220 14.6Lung 35.0 318 35.0 74 35.8 318 35.0 135 35.0Arrhyt. 32.0 1392 51.5 553 30.2 1392 41.5 1050 31.3

29/45



38/61

Feature Selection

Somme comments on existing FS methods:

Usually, wrappers perform better than embedded methods,taking a longer time

Embedded methods perform better than filters, being muchslower

On very high-dimensional datasets: both wrapper and embedded methods are too expensive filters are the only applicable option!

even some filter FS methods can take a prohibitive time in theredundancy analysis and elimination stage

30/45


( )


39/61

Feature Selection: Relevance-Redundancy (RR)

The Yu and Liu relevance-redundancy framework approach3

Optimal subset is provided by parts III and IV, whitout bothirrelevantand redundant features

3Yu, L., Liu, H., Dec. 2004. Efficient feature selection via analysis of

relevance and redundancy, JMLR 5, 12051224. 31/45


S ( )


40/61

Feature Selection: Relevance-Redundancy (RR)

The Yu and Liu relevance-redundancy framework approach4

First compute relevance, then redundancy After this, find the optimal subset (parts III and IV)

4Yu, L., Liu, H., Dec. 2004. Efficient feature selection via analysis of

relevance and redundancy, JMLR 5, 12051224. 32/45


A RR FS h f hi h di i l d (id )


41/61

A RR FS approach for high-dimensional data (ideas)

Key observations for filters on high-dimensional data: Some supervised filter methods (e.g. CFS and mrMR) are

computionally expensive

They waste time on subspace search and redundancy analysis

Redundancy is typically found among the most relevantfeatures

Our proposal for fast unsupervised and supervised filter RR FS onhigh-dimensional data:

sorts the dfeatures by decreasing relevance

computes the redundancy between the most relevant features

computes up to d1 pairwise similarities

33/45


A RR FS h f hi h di i l d (id )


42/61

A RR FS approach for high-dimensional data (ideas)

We keep only features with high relevance and low (i.e., belowsome threshold MS) similarity among themselves

It is not expected that redundant features are consecutive in

the ranked sorted feature list However, it is a waste of time to compute the redundancy

between weakly relevant and irrelevant features!

So, we perform the redundancy check only among the top

relevant features

34/45


A RR FS h f hi h di i l d t (d t il )


43/61

A RR FS approach for high-dimensional data (details)Input: X: n d matrix, n patterns of a d-dimensional training set.

m ( d):, maximum number of features to keep.

MS: maximum allowed similarity between pairs of features.Output: FeatKeep: an mdimensional array (with m m) containing the indexes

of the selected features.X : n m matrix, reduced dimensional training set, with features sorted

by decreasing relevance.

1: Compute the relevance riof each feature Xi(columns ofX), for

i {1, . . . , d}, using one of the dispersion measures MAD or MM).2: Sort the features by decreasing order ofri. Let i1, i2,..., idbe the

resulting permutation of{1,...,d} (i.e., ri1 ri2 ... rid).3: FeatKeep[1] =i1; prev=1; next=2;4: forf= 2 to d do

5: s=S(Xif, Xiprev

);6: if s


44/61

Relevance Measures

For unsupervised learning, we found relevance proportional to

dispersionThemean absolute difference

MADi = 1

n

nj=1

XijXi ,

and the mean-median

MMi =|Ximedian(Xi)|,

i.e., the absolute difference between the mean and median ofXiare adequate measures of relevance

They attain better results than variance

36/45


R d d M s


45/61

Redundancy MeasureTheredundancybetween two features, say Xi and Xj, is computedby the absolute cosine

| cos(XiXj)|=

Xi, Xj

Xi Xj

=

nk=1

XikXjk

n

k=1

X2ik

n

k=1

X2jk

, (1)

where , denotes the inner product and . the Euclidean normWe have 0 | cos(XiXj)| 1:

with 0 meaning that the two features are orthogonal(maximally different)

1 resulting from colinear features

Similar to the Pearsons correlation coefficient37/45


Experimental Results: Relevance and Redundancy


46/61

Experimental Results: Relevance and RedundancyThe relevance and similarity of the consecutive m= 1000top-ranked features of the Brain-Tumor1 dataset (d= 5920

features)

Among the top-ranked features, we have high redundancy!38/45


Experimental Results: Unsupervised FS


47/61

Experimental Results: Unsupervised FSComparison with other unsupervised approaches for a 10-fold CVwith linear SVM. Best result in bold face and second best is

underlinedOur Approach Unsupervised Baseline

Dataset MAD MM AMGM TV LS SPEC No FSColon 21.0 17.7 19.4 17.7 21.0 22.6 19.4SRBCT 0.0 0.0 0.0 0.0 0.0 0.0 0.0

PIE10P 0.0 0.0 0.0 0.0 0.0 0.0 0.0Lymphoma 2.2 2.2 2.2 2.2 2.2 2.2 2.2

Leukemia1 4.2 4.2 5.6 4.2 5.6 4.2 4.2

Brain-Tumor1 14.4 12.2 12.2 12.2 13.3 28.9 12.2Leukemia 2.8 2.8 2.8 2.8 2.8 30.6 2.8Example1 2.7 2.7 2.6 2.7 3.3 22.9 2.7

ORL0P 2.0 2.0 4.0 5.0 1.0 1.0 1.0Lung-Cancer 4.9 5.9 4.9 5.9 5.4 6.4 4.9

SMK-CAN-187 41.7 41.7 41.7 41.7 26.2 25.7 26.2Dexter 5.3 5.2 5.2 5.2 5.2 39.3 4.7GLI-85 12.9 14.1 15.3 14.1 11.8 9.4 11.8

Dorothea 24.0 24.0 24.0 24.0 25.0 22.0 24.039/45


Experimental Results: Supervised FS


48/61

Experimental Results: Supervised FSComparison with other supervised approaches for a 10-fold CV withlinear SVM. Best result in bold face and second best is underlined

Our Approach Supervised Filters BaseDataset MM FiR MI RF CFS FCBF FiR mrMR No FSColon 24.2 22.6 24.2 19.4 25.8 22.6 19.4 21.0 21.0SRBCT 0.0 0.0 0.0 0.0 0.0 4.8 0.0 4.8 0.0

PIE10P 0.0 0.0 0.5 0.0 * 1.0 0.0 24.8 0.0

Lymph. 2.2 2.2 2.2 2.2 * 3.3 2.2 22.8 2.2Leuk1 5.6 2.8 6.9 6.9 * 5.6 4.2 9.7 5.6B-Tum1 13.3 12.2 13.3 11.1 * 18.9 11.1 25.6 10.0

Leuk. 2.8 12.5 2.8 2.8 * 4.2 4.2 8.3 2.8

Example1 2.3 2.2 2.2 3.7 * 6.3 2.1 28.3 2.4ORL0P 4.0 5.0 2.0 1.0 * 1.0 2.0 68.0 1.0

B-Tum2 34.0 22.0 30.0 22.0 * 36.0 24.0 42.0 26.0P-Tumor 7.8 5.9 4.9 7.8 * 9.8 7.8 12.7 8.8L-Cancer 5.9 6.4 4.9 4.9 * 6.4 5.4 11.8 5.9

SMK-187 41.7 40.6 53.5 24.6 * 33.2 23.5 33.2 24.1Dexter 6.7 6.0 7.7 9.3 * 15.3 6.7 18.0 6.3GLI-85 14.1 12.9 17.6 11.8 * 20.0 14.1 16.5 14.1Dorot. 25.0 26.0 25.0 * * * 25.0 * 25.040/45


Experimental Results: Running Time


49/61

Experimental Results: Running-TimeFor each dataset:

the first row contains the test set error rates (%) of linear

SVM for a 10-fold CV for our RR method (with MS= 0.8)and other FS methods selecting m features

the second row shows the totalrunning time taken by each FSalgorithm to select m features for the 10-folds

Our RR Filter Embedded#, m MAD FiR LS SPEC RF FCBF FiR mrMR BRegColon 24.2 30.6 21.0 24.2 24.2 25.8 24.2 17.7 21.0m=1000 0.3 7.4 0.4 1.5 9.0 147.0 7.1 19.8 2.2

SRBCT 0.0 0.0 0.0 0.0 0.0 1.2 0.0 3.6 2.4m=1800 0.5 9.7 0.4 2.3 11.6 8.0 9.3 15.8 5.7

Lymph. 2.2 2.2 2.2 3.3 2.2 3.3 3.3 25.0 8.7m=2000 0.8 29.5 0.8 9.0 21.7 25.2 29.2 24.7 143.7

P-Tumor 8.8 6.9 10.8 9.8 5.9 10.8 9.8 13.7 7.8m=4000 1.8 21.6 2.4 13.5 47.8 29.1 21.1 48.8 46.1

L-Cancer 5.9 6.4 6.4 8.4 6.9 7.4 6.9 11.3 7.4m=8000 3.8 54.9 7.5 47.5 95.5 208.8 54.1 88.5 207.4

41/45


Conclusions


50/61

Conclusions

High-dimensional datasets are increasingly common Learning in high-dimensional data is challenging

Unsupervised and supervised FD and FS methods canalleviate the inherent complexity

Wrapper and embedded methods are too costly Our filter FD and FS proposals in this talk: are both time and space efficient attain competitive results with the state-of-the-art techniques can act as pre-processors to wrapper and embedded methods

A promising avenue of research (our ongoing work)?perform progressive FD and FS simultaneously!

42/45


Conclusions


51/61

Conclusions





42/45


Conclusions


52/61

Conclusions





42/45


Conclusions


53/61

Conclusions





42/45


Conclusions


54/61

Conclusions





42/45


Conclusions


55/61





42/45


Conclusions


56/61



Wrapper and embedded methods are too costly Our filter FD and FS proposals in this talk:

are both time and space efficient attain competitive results with the state-of-the-art techniques can act as pre-processors to wrapper and embedded methods


42/45


Conclusions


57/61






42/45


Conclusions


58/61






42/45


Calling Weka from MATLAB - FD


59/61

FD example. Discretize dataset X using Kononekos MDL method

...t = weka.filters.supervised.attribute.Discretize();config = wekaArgumentString(-R first-last,-K);

t.setOptions(config);wekaData = wekaCategoricalData(X,SY2MY(Y));t.setInputFormat( wekaData );t.useFilter( wekaData, t ) ;d = size(X,2);

for i=1 : dcutPoints(i) = t.getCutPoints(i-1);

end ...

43/45


Calling Weka from MATLAB - FS


60/61

FS example. Apply ReliefF on dataset X

...t = weka.attributeSelection.ReliefFAttributeEval();config = wekaArgumentString(-M, m, -D, 1 -K, k);

t.setOptions(config);t.buildEvaluator(wekaCategoricalData(X,SY2MY(Y)));out.W = zeros(1,nF);for i =1:nF;

out.W(i) = t.evaluateAttribute(i-1);

endout.fList = sort(out.W, descend);...

44/45


Some useful resources


61/61

Datasets available on-line: The university of California at Irvine(UCI) repository,

archive.ics.uci.edu/ml/datasets.html The gene expression model selector(GEMS) project, atwww.gems-system.org/

The Arizona state university(ASU) repository,http://featureselection.asu.edu/datasets.php

The world health organization (WHO) data,http://apps.who.int/ghodata/

Machine learning tools on-line: ENTool, machine learning toolbox,

http://www.j-wichard.de/entool/ PRTools machine learning toolbox, Delft University of

Technology, http://www.prtools.org/ Weka, http://www.cs.waikato.ac.nz/ml/weka/ The ASU FS package with filter and embedded methods

http://featureselection.asu.edu/software.php 45/45
http://localhost/var/www/apps/conversion/tmp/scratch_5/archive.ics.uci.edu/ml/datasets.htmlhttp://www.gems-system.org/http://featureselection.asu.edu/datasets.phphttp://apps.who.int/ghodata/http://www.j-wichard.de/entool/http://www.prtools.org/http://www.cs.waikato.ac.nz/ml/weka/http://featureselection.asu.edu/software.phphttp://featureselection.asu.edu/software.phphttp://www.cs.waikato.ac.nz/ml/weka/http://www.prtools.org/http://www.j-wichard.de/entool/http://apps.who.int/ghodata/http://featureselection.asu.edu/datasets.phphttp://www.gems-system.org/http://localhost/var/www/apps/conversion/tmp/scratch_5/archive.ics.uci.edu/ml/datasets.html

Documents

FD FS High Dimensional Data