RESULTS OF THE WCCI 2006 PERFORMANCE PREDICTION CHALLENGE Isabelle Guyon

RESULTS OF THE WCCI 2006 PERFORMANCE PREDICTION

CHALLENGE

Isabelle GuyonAmir Reza Saffari Azar Alamdari

Gideon Dror

Part I

INTRODUCTION

Model selection

• Selecting models (neural net, decision tree, SVM, …)

• Selecting hyperparameters (number of hidden units, weight decay/ridge, kernel parameters, …)

• Selecting variables or features (space dimensionality reduction.)

• Selecting patterns (data cleaning, data reduction, e.g by clustering.)

Performance prediction

How good are you at predicting

how good you are?

• Practically important in pilot studies.

• Good performance predictions render model selection trivial.

Why a challenge?

• Stimulate research and push the state-of-the art.

• Move towards fair comparisons and give a voice to methods that work but may not be backed up by theory (yet).

• Find practical solutions to true problems.• Have fun…

History

• USPS/NIST.• Unipen (with Lambert Schomaker): 40 institutions

share 5 million handwritten characters. • KDD cup, TREC, CASP, CAMDA, ICDAR, etc.• NIPS challenge on unlabeled data.• Feature selection challenge (with Steve Gunn):

success! ~75 entrants, thousands of entries.• Pascal challenges.• Performance prediction challenge …

Challenge

• Date started: Friday September 30, 2005.

• Date ended: Monday March 1, 2006

• Duration: 21 weeks.

• Estimated number of entrants: 145.

• Number of development entries: 4228.

• Number of ranked participants: 28.

• Number of ranked submissions: 117.

Datasets

Dataset Domain Type Feat-ures

Training Examples

Validation Examples

Test Examples

ADA Marketing Dense 48 4147 415 41471

GINA Digits Dense 970 3153 315 31532

HIVADrug discovery

Dense 1617 3845 384 38449

NOVAText classif.

Sparse binary 16969 1754 175 17537

SYLVA Ecology Dense 216 13086 1308 130858

http://www.modelselect.inf.ethz.ch/

BER distribution

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

50100150

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

50100150

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

50100150

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

50100150

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

50100150

BERTest BER

Results

Overall winners for ranked entries:

Ave rank: Roman Lutz with LB tree mix cut adaptedAve score: Gavin Cawley with Final #2

ADA: Marc Boullé with SNB(CMA)+10k F(2D) tv or SNB(CMA) + 100k F(2D) tv

GINA: Kari Torkkola & Eugene Tuv with ACE+RLSCHIVA: Gavin Cawley with Final #3 (corrected)NOVA: Gavin Cawley with Final #1SYLVA: Marc Boullé with SNB(CMA) + 10k F(3D) tv

Best AUC: Radford Neal with Bayesian Neural Networks

Part II

PROTOCOL and

SCORING

Protocol

• Data split: training/validation/test.• Data proportions: 10/1/100.• Online feed-back on validation data.• Validation label release one month before

end of challenge.• Final ranking on test data using the five

last complete submissions for each entrant.

Performance metrics

• Balanced Error Rate (BER): average of error rates of positive class and negative class.

• Guess error: BER = abs(testBER – guessedBER)

• Area Under the ROC Curve (AUC).

Optimistic guesses

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

Test BER

Scoring method

E = testBER + BER [1-exp(- BER/)] BER = abs(testBER – guessedBER)

Guessed BER

Test BER

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.510

Test BER

E testBER + BER

-10 -8 -6 -4 -2 0 2

log(gamma)

Roman LutzGavin Cawley

Radford Neal

Corinne Dahinden

Wei ChuNicolai Meinshausen

testBER testBER+BER

E = testBER + BER [1-exp(- BER/)]

Score (continued)

-10 -8 -6 -4 -2 0 2

log(gamma)

-10 -8 -6 -4 -2 0 20.02

log(gamma)

-10 -8 -6 -4 -2 0 20.2

log(gamma)

-10 -8 -6 -4 -2 0 20

log(gamma)

-10 -8 -6 -4 -2 0 20

log(gamma)

SYLVAADA GINA SYLVA

HIVA NOVA

Part III

RESULT ANALYSIS

What did we expect?

• Learn about new competitive machine learning techniques.

• Identify competitive methods of performance prediction, model selection, and ensemble learning (theory put into practice.)

• Drive research in the direction of refining such methods (on-going benchmark.)

Method comparison

0 0.05 0.1 0.15 0.2 0.25 0.3 0.3510

NN/BNNNB

LD/SVM/KLS/GP

Test BER

Danger of overfitting

0 20 40 60 80 100 120 140 1600

Time (days)

Full line: test BER

Dashed line: validation BER

How to estimate the BER?

• Statistical tests (Stats): Compute it on training data; compare with a “null hypothesis” e.g. the results obtained with a random permutation of the labels.

• Cross-validation (CV): Split the training data many times into training and validation set; average the validation data results.

• Guaranteed risk minimization (GRM): Use of theoretical performance bounds.

Stats / CV / GRM ???

Top ranking methods

• Performance prediction:– CV with many splits 90% train / 10% validation– Nested CV loops

• Model selection:– Use of a single model family– Regularized risk / Bayesian priors– Ensemble methods– Nested CV loops, computationally efficient with

with VLOO

Other methods

• Use of training data only:– Training BER.– Statistical tests.

• Bayesian evidence.

• Performance bounds.

• Bilevel optimization.

Part IV

CONCLUSIONS AND FURTHER WORK

Open problems

Bridge the gap between theory and practice…• What are the best estimators of the variance of CV?• What should k be in k-fold?• Are other cross-validation methods better than k-

fold (e.g bootstrap, 5x2CV)?• Are there better “hybrid” methods?• What search strategies are best?• More than 2 levels of inference?

Future work

• Game of model selection.

• JMLR special topic on model selection.

• IJCNN 2007 challenge!

Benchmarking model selection?

• Performance prediction: Participants just need to provide a guess of their test performance. If they can solve that problem, they can perform model selection efficiently. Easy and motivating.

• Selection of a model from a finite toolbox: In principle a more controlled benchmark, but less attractive to participants.

• CLOP=Challenge Learning Object Package.

• Based on the Spider developed at the Max Planck Institute.

• Two basic abstractions:– Data object– Model object

http://clopinet.com/isabelle/Projects/modelselect/MFAQ.html

CLOP tutorial

D=data(X,Y);hyper = {'degree=3', 'shrinkage=0.1'};

model = kridge(hyper); [resu, model] = train(model, D);tresu = test(model, testD);model = chain({standardize,kridge(hyper)});

At the Matlab prompt:

Conclusions

• Twice as much volume of participation as in the feature selection challenge

• Top methods as before (different order):– Ensembles of trees– Kernel methods (RLSC/LS-SVM, SVM)– Bayesian neural networks– Naïve Bayes.

• Danger of overfitting.• Triumph of cross-validation?

RESULTS OF THE WCCI 2006 PERFORMANCE PREDICTION CHALLENGE Isabelle Guyon

Documents

Feature selection and causal discovery fundamentals and applications Isabelle Guyon isabelle@clopinet.com

Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André

Causality Workbenchclopinet.com/causality Cause-Effect Pair Challenge Isabelle Guyon, ChaLearn IJCNN 2013 IEEE/INNS

Design of experiments for the NIPS 2003 variable selection benchmark Isabelle Guyon ...clopinet.com/isabelle/Projects/NIPS2003/Slides/NIPS2003... · 2003-12-10 · selection benchmark

Detecting Stable Clusters Using Principal Component Analysis · Detecting Stable Clusters Using Principal Component Analysis Asa Ben-Hur and Isabelle Guyon 1 Introduction Clustering

WCCI Appraisal Management Services

Feature Selection and Causal discovery Isabelle Guyon, Clopinet André Elisseeff, IBM Zürich Constantin Aliferis, Vanderbilt University

Lecture 1: Introduction to Feature Selection Isabelle Guyon isabelle@clopinet.com

Causality Workbenchclopinet.com/causality The LOCANET task (Pot-luck challenge, NIPS 2008) Isabelle Guyon, Clopinet Alexander Statnikov, Vanderbilt Univ

Higgs Machine Learning Challenge Claire Adam-Bourdarios, Glen Cowan, Cécile Germain, Isabelle Guyon, Balazs Kegl, David Rousseau higgsml@lal.in2p3.frhiggsml@lal.in2p3.fr

1 Unsupervised and Transfer Learning Challenge Unsupervised and Transfer Learning Challenge Isabelle Guyon Clopinet, California

Lecture 2: Learning without Over-learning Isabelle Guyon isabelle@clopinet.com

Active Learning Challenge Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Results of the AutoML challenge - INAOEemorales/Cursos/Aprendizaje... · Results of the AutoML challenge Isabelle Guyon, Imad Chaabane, Hugo Jair Escalante, Sergio Escalera, Damir

Feature selection methods Isabelle Guyon isabelle@clopinet.com IPAM summer school on Mathematics in Brain Imaging. July 2008

An Introduction to Variable and Feature Selection - MIT CSAIL · An Introduction to Variable and Feature Selection Isabelle Guyon ISABELLE@CLOPINET.COM Clopinet 955 Creston Road Berkeley,

Lecture 1: Introduction to Machine Learning Isabelle Guyon isabelle@clopinet.com

Machine Learning Computer Vision James Hays, Brown Slides: Isabelle Guyon, Erik Sudderth, Mark Johnson, Derek Hoiem Photo: CMU Machine Learning Department

Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15

Slides: James Hays, Isabelle Guyon, Erik Sudderth,