Day 2 - Thursday · Title: Day 2 - Thursday.pptx Author: Bobak Mortazavi Created Date: 9/4/2017 6:37:25 PM

9/4/17

1

Machine Learning for Clinical Care Overview

Fall 2017 Thursday, August 31

Outline

•  Describe modeling •  Introduce learning goals •  Supervised Machine Learning •  Unsupervised Machine Learning •  R or Python (2.7 – Anaconda)

Lesson’s Goals •  QuesTon(s): – What is Machine Learning? – How is it used in Clinical Data? – How can it be used in systems?

•  Goal(s): –  Learn the difference between Supervised and Unsupervised Machine Learning

–  Learn the metrics that are used to evaluate model effecTveness

–  Learn what packages are available in R and Python

What is Machine Learning?

•  Algorithms/Methods for StaTsTcal Learning from data

•  In context of medical analyTcs: – Pa\ern recogniTon to calculate probability/risk of certain events

– Match similar individuals to compare outcomes

Valuable References •  TL: Reference for tools with WEKA

•  BL: Good reference on metrics and classifiers

•  TR: Used in 633 – Good mathemaTcal reference

•  BR: Great staTsTcal machine learning text – theory based

Supervised vs. Unsupervised

•  Unsupervised: Use the data to determine some form of underlying organizaTon or structure

•  Supervised: Pa\ern recogniTon/regression. Knowing a specific value: – Y = f(X) – Being able to predict p(y’ | X’) to generate y’ = f(X’)

9/4/17

2

Unsupervised Machine Learning •  Clustering •  Determining Similarity •  Learning a meaningful representaTon of data from a large, noisy set of data

•  CSCE 633 – Machine Learning or similar course •  Examples: –  K-‐Means, Wards Hierarchical Clustering, ExpectaTon MaximizaTon

–  Packages: R: Mclust, hclust, python: scikit-‐learn

Supervised Machine Learning

•  Know the difference between two classes of objects (mulTclass extends from this)

•  Want to find the funcTon that defines one versus the other

•  (Regression slightly different)

Supervised Machine Learning: an example from Duda/Stork

Supervised Machine Learning: Finding the right input feature

Supervised Machine Learning: Finding the right input feature

Supervised Machine Learning: using mulTple input features

9/4/17

3

Supervised Machine Learning: non-‐linear models and overfijng Machine Learning Pipeline

•  Collect Input Data (e.g. body-‐wearable sensors)

•  Segment Appropriately (e.g. w.r.t. Tme) •  Extract Input features (reduce dimensionality) •  Train Machine Learning Model •  Validate •  Adjust Decision Threshold based upon Costs

Inpujng Data

•  Data might be of various types •  Too many dimensions, how do you reduce?

Dimensionality ReducTon: PCA

Dimensionality ReducTon: Filter Methods

•  Use staTsTcal tests to pick the most important variables

•  Likelihood raTo test (and p-‐value) •  InformaTon gain •  CorrelaTon

Dimensionality ReducTon: Wrapper Methods

•  Build machine learning methods with various sets of variables and select the one with the best accuracy

•  How do you measure accuracy? •  How do you add features? In what order?

9/4/17

4

Data Setup

•  What happens when features are: – 0/1 binary paTent histories? – 1/2/3/4 categorical variables with no ordinal nature?

– ConTnuous values in different numeric ranges?

Inpujng Data: NormalizaTon

•  Is it best to leave features in their own numeric ranges?

•  Put everything in a simple [0,1] interval? [-‐1,1]?

•  Normalize by standard centering and scaling? •  Convert all categorical variables to one-‐hot binary encoding?

Inpujng Data: Missing data

•  Data will onen be missing for a variety of reasons •  Machine learning algorithms will need complete data sets

•  Can impute in a variety of ways: – Median value – Mean value –  Create a machine learning funcTon from complete data to esTmate the value

Outputs

•  Supervised learning p(y’ | X’) •  Onen, classificaTon algorithms will output 0/1 (-‐1/1), what are they doing?

•  Decision threshold at 50% •  This class is concerned about the probability values

LogisTc Regression

•  Linear, direct regression •  What does this do to restrict inputs? •  How does it opTmize the coefficients? •  F(x) is a probability

Regression: Loss funcTon

9/4/17

5

Regression: Loss funcTon: h\ps://courses.cs.washington.edu/courses/cse547/16sp/slides/logisTc-‐SGD.pdf


Ensemble Methods

•  MulTple learners for the same problem •  Converge onto the right learner •  Can select features while it builds •  RegularizaTon! (Lasso and ElasTc Net)


LR with Lasso •  Great starTng point for all work •  Important parameter to learn: lambda •  Python: –  Sci-‐kit learn. LogisTc Regression, SGD Classifier, SGD Regressor (slight issue with sample weights)

•  R Package: GLMNET – Using cv.glmnet to find the right model + variables –  Coef to get the predicted coefficients –  Parallel=True opTon important for R (with doParallel and foreach packages)

Random Forest

Source: h\p://www.iis.ee.ic.ac.uk/icvl/iccv09_tutorial.html

9/4/17

6

Random Forest: Advantages •  No need to normalize variables •  No need to standardize variables •  Handles categorical variables well •  MulT-‐class classificaTon •  ProbabiliTes based upon leaf nodes •  Internally tests variable importance and generates rank •  Packages:

–  Python: Sci-‐kit learn –  R: randomForest, importance=TRUE –  Ranking of variables by Gini or by Mean Decrease in Accuracy –  Important Param: number of trees

Gradient Descent BoosTng

Source: h\p://www.iis.ee.ic.ac.uk/icvl/iccv09_tutorial.html

Gradient Descent BoosTng •  Package: XGBoost (R and Python) •  Understanding variable importance is tricky with xgboost

•  Important parameters: – Number of trees –  Learning rate (eta) – Max. Depth of each tree – ObjecTve: ‘binary:logisTc’ – Nthread: (parallelizaTon)

Other Popular Methods

•  Neural Networks •  Support Vector Machines •  Nearest Neighbor classifiers •  Other boosTng methods •  Hierarchical Classifiers

Support Vector Machine

h\p://docs.opencv.org/doc/tutorials/ml/introducTon_to_svm/introducTon_to_svm.html

SVM

•  e1071 in R, scikit-‐learn/libSVM in Python •  Requires prior feature selecTon techniques (otherwise ripe for overfijng)

•  Not good for calibraTon •  A number of hyperparameters to set •  Kernels dependent on data type and data size •  Important classifier to learn

9/4/17

7

Cross-‐ValidaTon

•  How can you test if your models are accurate? •  Cannot train and test on the same people (except in small, specific circumstances)

•  Onen do not have a second data set with the same variables to test against externally

•  Internal validaTon by splijng up the cohort

K-‐fold straTfied cross-‐validaTon

Load data set

Bleeds

Non-‐Bleeds

Bleeds 20%

Bleeds 20%

Bleeds 20%

Bleeds 20%

Bleeds 20%

Non-‐Bleeds 20%

Non-‐Bleeds 20%

Non-‐Bleeds 20%

Non-‐Bleeds 20%

Non-‐Bleeds 20%

Grid Search

Load data set: Build 5-‐fold

Cross-‐Validation

Select Training Fold

Select Testing Fold

Create 5-‐fold cross-‐validation of training set

Use 5-‐fold CV of Training to Select Features in hold-‐out environment

Generate Feature Ranking: Build Model on

fold

Validate Model and view feature importance in

holdout

Generate Statistics and Confidence

Intervals on CV

Repeat for each fold

Clinical Survival Analysis

•  Clinical predicTon models are onen logisTc regression (or some variaTon of that or poisson regression)

•  Clinical datasets onen come with one extra factor – Tme to event

•  Thus, some clinical models want to present probability of an event + likelihood of that event within given windows of Tme

Kaplan-‐Meier Survival

•  At any given point in Tme, the probability of survival is raTo of surviving paTents vs. paTents at risk

•  At any Tme, if a paTent has already died or dropped out of being monitored, no longer considered in raTo

•  R: survival •  Python: lifelines? scikit-‐survival?

Kaplan-‐Meier Survival

0 50 100 150 200 250 300 350

0.0

0.2

0.4

0.6

0.8

1.0

Kaplan−Meier Survival

time (days)

Surv

ival

1234

9/4/17

8

Cox ProporTonal Hazards •  Another survival method

•  Model’s the rate of failure •  The coefficients give comparaTve effecTveness informaTon when comparing test vs. control

•  More on evaluaTng the outputs next lecture •  Packages: –  R: survival –  Python: lifelines?

Outputs

•  ProbabiliTes vs. classificaTon label •  Odds RaTos •  Hazard RaTos •  Variable Importance •  Decision Threshold + Associated Metrics

Documents

Day 2 - Thursday · Title: Day 2 - Thursday.pptx Author: Bobak Mortazavi Created Date: 9/4/2017 6:37:25 PM