Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
9/4/17
1
Machine Learning for Clinical Care Overview
Fall 2017 Thursday, August 31
Outline
• Describe modeling • Introduce learning goals • Supervised Machine Learning • Unsupervised Machine Learning • R or Python (2.7 – Anaconda)
Lesson’s Goals • QuesTon(s): – What is Machine Learning? – How is it used in Clinical Data? – How can it be used in systems?
• Goal(s): – Learn the difference between Supervised and Unsupervised Machine Learning
– Learn the metrics that are used to evaluate model effecTveness
– Learn what packages are available in R and Python
What is Machine Learning?
• Algorithms/Methods for StaTsTcal Learning from data
• In context of medical analyTcs: – Pa\ern recogniTon to calculate probability/risk of certain events
– Match similar individuals to compare outcomes
Valuable References • TL: Reference for tools with WEKA
• BL: Good reference on metrics and classifiers
• TR: Used in 633 – Good mathemaTcal reference
• BR: Great staTsTcal machine learning text – theory based
Supervised vs. Unsupervised
• Unsupervised: Use the data to determine some form of underlying organizaTon or structure
• Supervised: Pa\ern recogniTon/regression. Knowing a specific value: – Y = f(X) – Being able to predict p(y’ | X’) to generate y’ = f(X’)
9/4/17
2
Unsupervised Machine Learning • Clustering • Determining Similarity • Learning a meaningful representaTon of data from a large, noisy set of data
• CSCE 633 – Machine Learning or similar course • Examples: – K-‐Means, Wards Hierarchical Clustering, ExpectaTon MaximizaTon
– Packages: R: Mclust, hclust, python: scikit-‐learn
Supervised Machine Learning
• Know the difference between two classes of objects (mulTclass extends from this)
• Want to find the funcTon that defines one versus the other
• (Regression slightly different)
Supervised Machine Learning: an example from Duda/Stork
Supervised Machine Learning: Finding the right input feature
Supervised Machine Learning: Finding the right input feature
Supervised Machine Learning: using mulTple input features
9/4/17
3
Supervised Machine Learning: non-‐linear models and overfijng Machine Learning Pipeline
• Collect Input Data (e.g. body-‐wearable sensors)
• Segment Appropriately (e.g. w.r.t. Tme) • Extract Input features (reduce dimensionality) • Train Machine Learning Model • Validate • Adjust Decision Threshold based upon Costs
Inpujng Data
• Data might be of various types • Too many dimensions, how do you reduce?
Dimensionality ReducTon: PCA
Dimensionality ReducTon: Filter Methods
• Use staTsTcal tests to pick the most important variables
• Likelihood raTo test (and p-‐value) • InformaTon gain • CorrelaTon
Dimensionality ReducTon: Wrapper Methods
• Build machine learning methods with various sets of variables and select the one with the best accuracy
• How do you measure accuracy? • How do you add features? In what order?
9/4/17
4
Data Setup
• What happens when features are: – 0/1 binary paTent histories? – 1/2/3/4 categorical variables with no ordinal nature?
– ConTnuous values in different numeric ranges?
Inpujng Data: NormalizaTon
• Is it best to leave features in their own numeric ranges?
• Put everything in a simple [0,1] interval? [-‐1,1]?
• Normalize by standard centering and scaling? • Convert all categorical variables to one-‐hot binary encoding?
Inpujng Data: Missing data
• Data will onen be missing for a variety of reasons • Machine learning algorithms will need complete data sets
• Can impute in a variety of ways: – Median value – Mean value – Create a machine learning funcTon from complete data to esTmate the value
Outputs
• Supervised learning p(y’ | X’) • Onen, classificaTon algorithms will output 0/1 (-‐1/1), what are they doing?
• Decision threshold at 50% • This class is concerned about the probability values
LogisTc Regression
• Linear, direct regression • What does this do to restrict inputs? • How does it opTmize the coefficients? • F(x) is a probability
Regression: Loss funcTon
9/4/17
5
Regression: Loss funcTon: h\ps://courses.cs.washington.edu/courses/cse547/16sp/slides/logisTc-‐SGD.pdf
Regression: Loss funcTon: h\ps://courses.cs.washington.edu/courses/cse547/16sp/slides/logisTc-‐SGD.pdf
Ensemble Methods
• MulTple learners for the same problem • Converge onto the right learner • Can select features while it builds • RegularizaTon! (Lasso and ElasTc Net)
Regression: Loss funcTon: h\ps://courses.cs.washington.edu/courses/cse547/16sp/slides/logisTc-‐SGD.pdf
LR with Lasso • Great starTng point for all work • Important parameter to learn: lambda • Python: – Sci-‐kit learn. LogisTc Regression, SGD Classifier, SGD Regressor (slight issue with sample weights)
• R Package: GLMNET – Using cv.glmnet to find the right model + variables – Coef to get the predicted coefficients – Parallel=True opTon important for R (with doParallel and foreach packages)
Random Forest
Source: h\p://www.iis.ee.ic.ac.uk/icvl/iccv09_tutorial.html
9/4/17
6
Random Forest: Advantages • No need to normalize variables • No need to standardize variables • Handles categorical variables well • MulT-‐class classificaTon • ProbabiliTes based upon leaf nodes • Internally tests variable importance and generates rank • Packages:
– Python: Sci-‐kit learn – R: randomForest, importance=TRUE – Ranking of variables by Gini or by Mean Decrease in Accuracy – Important Param: number of trees
Gradient Descent BoosTng
Source: h\p://www.iis.ee.ic.ac.uk/icvl/iccv09_tutorial.html
Gradient Descent BoosTng • Package: XGBoost (R and Python) • Understanding variable importance is tricky with xgboost
• Important parameters: – Number of trees – Learning rate (eta) – Max. Depth of each tree – ObjecTve: ‘binary:logisTc’ – Nthread: (parallelizaTon)
Other Popular Methods
• Neural Networks • Support Vector Machines • Nearest Neighbor classifiers • Other boosTng methods • Hierarchical Classifiers
Support Vector Machine
h\p://docs.opencv.org/doc/tutorials/ml/introducTon_to_svm/introducTon_to_svm.html
SVM
• e1071 in R, scikit-‐learn/libSVM in Python • Requires prior feature selecTon techniques (otherwise ripe for overfijng)
• Not good for calibraTon • A number of hyperparameters to set • Kernels dependent on data type and data size • Important classifier to learn
9/4/17
7
Cross-‐ValidaTon
• How can you test if your models are accurate? • Cannot train and test on the same people (except in small, specific circumstances)
• Onen do not have a second data set with the same variables to test against externally
• Internal validaTon by splijng up the cohort
K-‐fold straTfied cross-‐validaTon
Load data set
Bleeds
Non-‐Bleeds
Bleeds 20%
Bleeds 20%
Bleeds 20%
Bleeds 20%
Bleeds 20%
Non-‐Bleeds 20%
Non-‐Bleeds 20%
Non-‐Bleeds 20%
Non-‐Bleeds 20%
Non-‐Bleeds 20%
Grid Search
Load data set: Build 5-‐fold
Cross-‐Validation
Select Training Fold
Select Testing Fold
Create 5-‐fold cross-‐validation of training set
Use 5-‐fold CV of Training to Select Features in hold-‐out environment
Generate Feature Ranking: Build Model on
fold
Validate Model and view feature importance in
holdout
Generate Statistics and Confidence
Intervals on CV
Repeat for each fold
Clinical Survival Analysis
• Clinical predicTon models are onen logisTc regression (or some variaTon of that or poisson regression)
• Clinical datasets onen come with one extra factor – Tme to event
• Thus, some clinical models want to present probability of an event + likelihood of that event within given windows of Tme
Kaplan-‐Meier Survival
• At any given point in Tme, the probability of survival is raTo of surviving paTents vs. paTents at risk
• At any Tme, if a paTent has already died or dropped out of being monitored, no longer considered in raTo
• R: survival • Python: lifelines? scikit-‐survival?
Kaplan-‐Meier Survival
0 50 100 150 200 250 300 350
0.0
0.2
0.4
0.6
0.8
1.0
Kaplan−Meier Survival
time (days)
Surv
ival
1234
9/4/17
8
Cox ProporTonal Hazards • Another survival method
• Model’s the rate of failure • The coefficients give comparaTve effecTveness informaTon when comparing test vs. control
• More on evaluaTng the outputs next lecture • Packages: – R: survival – Python: lifelines?
Outputs
• ProbabiliTes vs. classificaTon label • Odds RaTos • Hazard RaTos • Variable Importance • Decision Threshold + Associated Metrics