78
EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE LEARNING, BUT WERE FORCED TO FIND OUT Ivan Štajduhar [email protected] SSIP 2019 27 TH SUMMER SCHOOL ON IMAGE PROCESSING, TIMISOARA, ROMANIA July 10 th 2019

EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE LEARNING,

BUT WERE FORCED TO FIND OUT

Ivan Š[email protected]

SSIP 201927TH SUMMER SCHOOL ON IMAGE PROCESSING, TIMISOARA, ROMANIA

July 10th 2019

Page 3: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm
Page 4: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

INTRODUCTION AND MOTIVATION

EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE LEARNING, BUT WERE FORCED TO FIND OUT

Page 5: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm
Page 6: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Challenges

Page 7: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Solution

• Model-based techniques– Manually tailored

– Variation and complexity in clinical data

– Limited by current insights into clinical conditions, diagnostic modelling and therapy

– Hard to establish analytical solutions

Page 8: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Hržić, Franko, et al. "Local-Entropy Based Approach for X-Ray ImageSegmentation and Fracture Detection." Entropy 21.4 (2019): 338. (uniri-tehnic-18-15)

Page 9: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Machine learning

• Model-based techniques– Manually tailored

– Variation and complexity in clinical data

– Limited by current insights into clinical conditions, diagnostic modelling and therapy

– Hard to establish analytical solutions

• An alternative: learning from data– Minimising an objective function

Page 10: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

PACS

PET US

MRICT

Page 11: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Summary

• Introduction and motivation

• Representation, optimisation & stuff

• Evaluation metrics & experimental setup

• Improving model performance

Page 12: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

REPRESENTATION, OPTIMISATION & STUFF

EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE LEARNING, BUT WERE FORCED TO FIND OUT

Page 13: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Machine learning

• Machine learning techniques mainly deal with representation, performance assessment and optimisation:

– The learning process is always preceded by the choice of a formal representation of the model. A set of possible models is called the space of the hypothesis.

– Learning algorithm uses a cost function to determine (evaluate) how successful a model is

– Optimisation is the process of choosing the most successful models

Page 14: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Hypothesis

• Learning type: supervised vs unsupervised

• Hypothesis type: regression vs classification

continuousoutcome

categoricaloutcome

labelleddata

unlabelleddata

Page 15: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Hypothesis

Data

Learning algorithm

hObservation

(known variables, easily obtainable)

Outcome(prediction)

Page 16: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

16

Page 17: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

featureextraction

predictor

Page 18: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Hypothesis and parameter estimation

Page 19: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Hypothesis and parameter estimation

Page 20: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Regularisation

• A way of reducing overfitting by ignoring non-informative features

• What is overfitting?

• Many types of regularisation– Quadratic regulariser

Page 21: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Multilayer perceptron (MLP)

• An extension of the logistic-regression idea

Page 22: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Multilayer perceptron (MLP)

• Parameters estimated through backpropagation algorithm

evidence

error CS231n: Convolutional Neural Networks for Visual Recognition, Stanford Universityhttp://cs231n.stanford.edu/2016/

Page 23: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Multilayer perceptron (MLP)

CS231n: Convolutional Neural Networks for Visual Recognition, Stanford Universityhttp://cs231n.stanford.edu/2016/

Convolutional layer

Normalisation layer

Activation function

Fully-connected layer

Normalisation layer

Activation function

Page 24: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Support vector machine (SVM)

• Maximum margin classifier

Page 25: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Support vector machine (SVM)

• Often used with kernels for dealing with linearly non-separable problems

• Quadratic programming solver optimisation

Page 26: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Support vector machine (SVM)

Page 27: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Kernels

• A possible way of dealing with linearly non-separable problems

• A measure of similarity between data points

• Kernel trick can implicitly escalate dimensionality

Page 28: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Tree models

• An alternative form of mapping

• Partition the input space into cuboid regions

– Can be easily interpretable

Temperature

Skyyesyes

yesno no

coldmoderate

hot

cloudysunny

rainy

Page 29: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Tree models

• An alternative form of mapping

• Partition the input space into cuboid regions

– Can be easily interpretable

Page 30: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

CART

• CART hypothesis

• Local optimisation (greedy) – recursive partitioning

• Cost function

Page 31: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

CART|

|

Page 32: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Short recap

• Representation, optimisation & stuff

|

Page 33: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm
Page 34: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

EVALUATION METRICS &EXPERIMENTAL SETUP

EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE LEARNING, BUT WERE FORCED TO FIND OUT

Page 35: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Evaluation metrics

• The choice of an adequate model-evaluation metric depends on the modelling goal

• Common metrics for common problems:– Mean squared error (for regression)

– Classification accuracy

Page 36: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Evaluation metrics

• Confusion matrix– binary case

– multiple classes (K>2)

Predicted outcome

Ob

serv

ed o

utc

om

e

Confusion matrix (normalised)

36

Predictedoutcome

Predictedoutcome

Observedoutcome

Truepositive

(TP)

False negative

(FN)

Observed outcome

False positive

(FP)

True negative

(TN)

outcome = class = label

Page 37: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Evaluation metrics

• Common metrics for class-imbalanced problems, or when misclassifications are not equally bad:– Sensitivity (recall, true positive rate)

– Specificity

– F1 score

37

Predictedoutcome

Predictedoutcome

Observedoutcome

Truepositive

(TP)

False negative

(FN)

Observed outcome

False positive

(FP)

True negative

(TN)

Page 38: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Fawcett, Tom. "An introduction to ROC analysis." Pattern recognition letters 27.8 (2006): 861-874.

Predictedoutcome

Predictedoutcome

Observedoutcome

Truepositive

(TP)

False negative

(FN)

Observed outcome

False positive

(FP)

True negative

(TN)

Evaluation metrics

• For class-imbalanced problems, or when misclassifications are not equally bad (probabilistic classification):– Receiver operating characteristic (ROC) curve

1-SPEC

Page 39: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Evaluation metrics

• For highly-skewed class distributions (probabilistic classification):– Precision-recall (PR) curve

– Area under the curve (AUC)

Davis, Jesse, and Mark Goadrich. "The relationship between Precision-Recall and ROC curves." Proceedings of the 23rd international conference on Machine learning. ACM, 2006.

AUROC

Page 40: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Evaluation metrics

• Uncertain labellings, e.g., censored survival data

• Kaplan-Meier estimate of a survival function

• Alternative evaluation metrics:– Log-rank test

– Concordance index

– Explained residual variation of integrated Brier score

LOWHIGH

Risk group:

Kaplan-Meier estimate

Page 41: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Experimental setup

• Setting up an unbiased experiment– How well will the model perform on new, yet unseen, data?

• Dataset split:

• n-fold cross-validation or leave-one-out test

• Fold stratification of class-wise distribution

• Multiple iterations of fold splits

train test

training

Page 42: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

training

Experimental setup

• Estimating a model wisely– validation data used for tuning hyperparameters

Data

Learning algorithm

Mo

A

Page 43: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Experimental setup

• Fair estimate of a classifier / regression model– data preprocessing can bias your conclusions

– watch out for naturally correlated observations, e.g.• diagnostic scan of a patient now and before

• different imaging modalities of the same subject

– artificially generated data can cause a mess

– use testing data only after you are done with the training

• Fair estimate of a segmenter– splitting pixels of the same image into separate sets is usually not the best

idea

– also, use a separate testing set of images

How well will the model perform on new, yet unseen, data?

training

Page 44: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

POPULATION

Method performance comparison

• You devised a new method– What makes your method better?

– Is it significantly better?

• Is the sample representative of a population?

• Hypothesis: It is better (maybe)

Page 45: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Method performance comparison

• Hypothesis: A equals B!– Non-parametric statistical tests

– A level of significance α is used to determine at which level the hypothesis may be rejected

– The smaller the p-value, the stronger the evidence against the null hypothesis

Demšar, Janez. "Statistical comparisons of classifiers over multiple data sets." Journal of Machine learning research 7.Jan (2006): 1-30.

Derrac, Joaquín, et al. "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms." Swarm and Evolutionary Computation 1.1 (2011): 3-18.

Page 46: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Two classifiers

• Comparing performance against baseline (over multiple datasets)

• Wilcoxon signed ranks test

Demšar, Janez. "Statistical comparisons of classifiers over multiple data sets." Journal of Machine learning research 7.Jan (2006): 1-30.

Page 47: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Multiple classifiers

• Comparing performance of multiple classifiers (over multiple datasets)

• Friedman test

• Post-hoc tests

Demšar, Janez. "Statistical comparisons of classifiers over multiple data sets." Journal of Machine learning research 7.Jan (2006): 1-30.

Page 48: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Multiple classifiers

• Friedman test

Demšar, Janez. "Statistical comparisons of classifiers over multiple data sets." Journal of Machine learning research 7.Jan (2006): 1-30.

Page 49: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Multiple classifiers

• Post-hoc tests

• All-vs-all

– Nemenyi test

• One-vs-all

– Bonferroni-Dunn test

Demšar, Janez. "Statistical comparisons of classifiers over multiple data sets." Journal of Machine learning research 7.Jan (2006): 1-30.

worse

Page 50: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm
Page 51: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

IMPROVING MODEL PERFORMANCE

EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE LEARNING, BUT WERE FORCED TO FIND OUT

Page 52: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Tuning hyperparameters

• How do you know if you have chosen the right hyperparameters?

• Empirical (training) cost

– How well can we fit the existing data

• Expected (validation) cost

– What we can expect in reality

• Substitute cost with arbitrary evaluation metric

Data

Learning algorithm

Mo

A

Page 53: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Overfitting and underfitting

Page 54: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Bias-variance tradeoff

Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Vol. 1. No. 10. New York, NY, USA:: Springer series in statistics, 2001.

bia

s

variance

Page 55: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Tuning hyperparameters

• Trial and error

• Gridsearch CV Data

Learning algorithm

Mo

A

Page 56: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Tuning hyperparameters

• Where applicable, track goal convergence

Hypothesis complexity

Co

st

Number of iterationsC

ost

Page 57: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Lowering bias and variance

• Bias or variance still not constrained?– Add new features

– Reduce problem complexity

– Generate more data

– Use transfer learning

– Use ensembles

Page 58: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Reducing problem complexity

• Make the learning problem easier through data preprocessing– Spatial data preprocessing

– Feature extraction

Page 59: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

OppositeorientationR

edu

ce p

rob

lem

co

mp

lexi

ty

Page 60: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Generating more data

• Will generating more data help?– Check using a learning curve plot

• If possible, acquire more data and increase model complexity– Alternatively, generate artificial (synthetic) data

Page 61: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Using transfer learning

Tajbakhsh, Nima, et al. "Convolutional neural networks for medical image analysis: full training or fine tuning?." IEEE transactions on medical imaging 35.5 (2016): 1299-1312.

Page 62: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Biological justification – visual cortex

Hubel, David H., and Torsten N. Wiesel. "Receptive fields of single neurones in the cat's striate cortex." The Journal of physiology 148.3 (1959): 574-591.

Page 63: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm
Page 64: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Using ensembles

Page 65: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Using ensembles

• Ensemble:– Multiple submodels combined to make one strong model

– Sample the training data differently for each submodel

– Define a voting/averaging politic over submodels

• Bagging vs boosting

Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Vol. 1. No. 10. New York, NY, USA:: Springer series in statistics, 2001.

Page 66: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Bagging

• Bagging (or bootstrap aggregation)– A technique for reducing the variance of an estimated model

– Averaging reduces variance and leaves bias unchanged

– Works especially well for high-variance, low-bias procedures (e.g. trees)

• 50 members vote in 10 categories (4 nominations)

• 15 members are (somewhat) informed (random for category)

• Majority wins

Page 67: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Random forests

• If a smaller subset of features is far more informative than the rest – loss of diversity in the ensemble

• At branching, randomly pick a subset of features and bootstrap the dataset– recursively partition

Page 68: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Boosting

• Boosting

– Similar to bagging, but involving vote weighting

– Each weak-model accuracy only slightly better than random guessing

• Adaboost.M1

Temperature = “hot”

yesno

NOYES

Page 69: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm
Page 70: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Gradient Boosting

• Learn each new model explicitly from the error of the previous additive model, i.e. use additive training

Page 71: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Gradient Boosting

• Optimal additive model obtained by calculating the gradient on the residual of the previous additive model

• XGboost ??

?...

Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting system." Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 2016.

Page 72: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Applicability of learned models

• Good performances

• Handling missing values

• Noise handling

• Transparency of diagnostic knowledge

• Explanatory capabilities

• Reducing the number of tests

Kononenko, Igor. "Machine learning for medical diagnosis: history, state of the art and perspective." Artificial Intelligence in medicine 23.1 (2001): 89-109.

Page 73: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Noise

• Errors in data– Regularisation

• Errors in labels (supervised learning)– Ground truth availability (garbage in – garbage out)

• Do you expect this (type of) noise in the future?– E.g. a flaw in the data acquisition method

• Outlier removal in a (one-time) preprocessing step– E.g. inspect the univariate or multivariate normal distribution

Page 74: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

• Minimum (end-systolic) and maximum (end-diastolic) volumes and ejection fraction estimation

• Diabetic retinopathy detection

• Predicting length of hospital stay

• Brachial Plexus nerve segmentation in ultrasound images

• Identifying risk population cervical cancer

Page 75: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

• Understand the data

• Define what you want to accomplish

• Choose an appropriate hypothesis and optimisation technique– Do not overdo it

– SVMs and RFs are the best choice in most cases

• Stay unbiased (experimentally)

• Track the learning process

• Reduce bias and variance

• Perform a statistical comparison

test

train

Page 76: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Hmmm... Experimental results suggest the model is good

Finally, I might be getting my big break...

Page 77: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

Other sources (of figures, mostly)

BOOKS:

• Christopher Bishop, Pattern Recognition and Machine Learning. Springer-Verlag New York, 2006.

• Duda, Richard O., Peter E. Hart, and David G. Stork. Pattern classification. John Wiley & Sons, 2012.

• Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Vol. 1. No. 10. New York, NY, USA:: Springer series in statistics, 2001.

JOURNALS AND PROCEEDINGS:

• D. Gering. W. Lu, K. Ruchala, G. Olivera. Utilizing Shape Models Composed of Geometric Primitives for Organ Segmentation. American Association of Physicists in Medicine (AAPM), Anaheim, CA, July 2009.

• Enquobahrie, Andinet, et al. "The image-guided surgery toolkit IGSTK: an open source C++ software toolkit." Journal of Digital Imaging 20.1 (2007): 21-33.

• Müller, Henning, Patrick Ruch, and Antoine Geissbuhler. "Enriching content-based image retrieval with multi-lingual search terms." Swiss Medical Informatics 54 (2005): 6-11.

• Yang, Liu, et al. "A boosting framework for visuality-preserving distance metric learning and its application to medical image retrieval." IEEE Transactions on Pattern Analysis and Machine Intelligence 32.1 (2010): 30-44.

OTHER:

• http://cecs.wright.edu/~agoshtas/OMI.html

• http://www.diagnijmegen.nl/index.php/Computer-Aided_Diagnosis_of_Retinal_Images_(CADR)

• https://radiopaedia.org/

• http://scott.fortmann-roe.com/docs/BiasVariance.html

• Andrea Vedaldi, (Somewhat) Advanced Convolutional Neural Networks, MISS 2016

• Andrew Ng, Machine Learning, Coursera 2016

• https://www.kaggle.com/c/dogs-vs-cats/discussion/6984

• http://hmcbee.blogspot.hr/2015/06/toddlers-in-transistors-teaching.html

• https://xkcd.com/

• https://sebastianraschka.com/blog/2016/model-evaluation-selection-part3.html

• Wikipedia *

• Deviantart *

• Meme sites *

Page 78: EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE … · "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm

EVERYTHING YOU NEVER WANTED TO KNOW ABOUT MACHINE LEARNING,

BUT WERE FORCED TO FIND OUT

Ivan Š[email protected]

SSIP 201927TH SUMMER SCHOOL ON IMAGE PROCESSING, TIMISOARA, ROMANIA

July 10th 2019