Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Michael Brückner
Manager Machine Learning
25/02/2016
Machine Learning 101
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• What is Machine Learning and why do we need it?
• Model Building
• Model Evaluation & Tuning
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is Machine Learning?
Methods and Systems that …
Adaptbased on recorded
data
Predictnew data based on recorded
data
Optimizean action given a utility
function
Extracthidden
structure from the
data
Summarizedata into concise
descriptions
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is Machine Learning NOT?
Methods and Systems that …
can yield Garbage-In Knowledge-
Out
perform well without
data modeling& feature
engineering
avoid the curse-of-
dimensionality
are a replacement for business
rules
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Infer-Predict-Decide Cycle
Inference
Build & evaluate Predictor
Prediction
Apply the learned Predictor
Decision Making
Adjust Business lossand get new/more data
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What for?
Automate tasks, which typically require humans in order to
• scale
• improve over humans (non-experts)
• preserve privacy
or solve tasks that are impossible for humans
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Examples: Personalized Recommandation
• Input:
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Examples: Personalized Recommandation
• Output:
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Examples: Face Detection & Recognition
Face detection
• Input: image
• Output: face position
Face recognition
• Input: face (image & face position)
• Output: person’s name
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Examples: Full-Text Translation
• Input: text in one language
• Output: text of another language
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Examples: Spam Filtering
• Input: email (text, images, …)
• Output: spam/non-spam flag
• Challenges:
• extremely high precision for
legitimate emails
• spam changes constantly
• noisy ground truth
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Supervised Machine Learning
1. Model problem in terms of input data and output data
2. Collect sample of input-output pairs
3. Learn a mapping that produces the output given the
input
4. Apply this function on new inputs to make predictions
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A Programer’s Perspective
Traditional Programming (Predicting)
Supervised Machine Learning
Computer
Input Data
Mapping
Output Data
Computer
Input Data
Output Data
Mapping
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Advantages
• Use data instead of intuition to derive the mapping
• Can solve very complex tasks
• Can adapt to new situations (collect more data)
• Does not require much expert knowledge
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Input Data
Description Type Cost Actual Cost Diff In Catalogue
Movies Entertainment $50 $28 $22 Yes
Music (CDs, MP3s, etc.) $500 $30 $470 No
Sporting Events Entertainment $0 $40 ($40) No
Dining Out Food $1,000 $1,200 ($200) Yes
Groceries $100 $0 $100 Yes
Charity 1 Gifts and Charity $200 $200 $0 No
Charity 2 $500 $500 $0 No
Cable/Satellite Housing $100 $100 $0 Yes
Electric Housing $45 $40 $5 Yes
Mortgage or Rent $700 $700 $0 Yes
Health Insurance $400 $400 $0 Yes
Home Insurance $400 $400 $0 No
Credit Card 1 $0 Yes
Dataset
Categorical Data
Missing Data
Binary Data
Numerical Data
Attribute Name
Attribute Value
Attribute
Text Data
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Description Type Cost Actual Cost Diff In Catalogue
Movies Entertainment $50 $28 $22 Yes
Music (CDs, MP3s, etc.) ? $500 $30 $470 No
Sporting Events Entertainment $0 $40 ($40) No
Dining Out Food $1,000 $1,200 ($200) Yes
Groceries ? $100 $0 $100 Yes
Charity 1 Gifts and Charity $200 $200 $0 No
Charity 2 ? $500 $500 $0 No
Cable/Satellite Housing $100 $100 $0 Yes
Electric Housing $45 $40 $5 Yes
Mortgage or Rent ? $700 $700 $0 Yes
Health Insurance $400 $400 $0 Yes
Home Insurance $400 $400 $0 No
Credit Card 1 ? $0 Yes
Output Data
Target Attribute Values
Target Attribute
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• What is Machine Learning and why do we need it?
• Model Building
• Model Evaluation & Tuning
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Problem Setting
• Input: vector of observable attributes, x
• Output: target attribute value, y
• Training data: pairs of input and corresponding output,
D = (x1,y1),…,(xN,yN)
• Application data: inputs only
• Goal: learn mapping fw:x ↦ y
Predictor
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges in Model Building
• Which function class for Predictor (data modeling)?
• How to pre-process the data (feature engineering)?
• How to learn this Predictor from our training data?
• How to generalize to new data?
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Which function class for Predictor?
Types of prediction tasks (output type):
• Binary Classification ⇒ binary target y {–1, +1}
• Multinomial Classification ⇒ categorical target y {1… K}
• Regression ⇒ numeric target y [ l ,u] R
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Which function class for Binary Classification?
• Decision Tree
+
+-
-
-
x2 > 7?
no yes
+
+
+
+
+
x1 < 3?
no yes
x2 < 5?
no yes
x1 < 1?
no yes
+
+
-
-
x2
x11 3
5
7
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Which function class for Binary Classification?
• Decision Tree
+-
x2 > 7?
no yes
+
x1 < 3?
no yes
x2 < 5?
no yes
x1 < 1?
no yes
+ -
x2
x1
+
--
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Which function class for Binary Classification?
• Linear function
• binary target attribute
values y {–1, +1}
x2
x1
Hw +
-
y(x) = sign( fw(x))
Hw
={x | fw(x) = xTw+ w
0= 0}
^
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Which function class for Binary Classification?
• Generalized linear function
(Kernel methods)
• Layered Generalized linear
function (Neural Networks)
• Ensemble of functions
• …
x2
x1
+
- +
-
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to pre-process the data?
• Predictor’s function class defined for limited input domain
⇒ transform/extract attributes first (pre-processing)
• Number to (normalized) Number:
• z-standardization, min-max normalization
• Number to Category:
• Binning (quantile, equidistant)
• Category to (numeric) Vector:
• One-hot encoding
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to pre-process the data?
• Predictor’s function class defined for limited input domain
⇒ transform/extract attributes first (pre-processing)
• Text to (numeric) Vector:
• Normalization, tokenization, stemming
• Bag-of-Words, Bag-of-NGrams, TI-IDF ⇒ sparse vector
• Latent word embedding (LSI, word2vec, LDA) ⇒ dense vector
• Image to (numeric) Vector:
• HoG, DAISY, color histogram
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to learn a Predictor?
• Loss of Predictor fw:x ↦ y for a given input-output pair:
Loss function PredictionGround Truth
L(y, fw(x))
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to learn a Predictor?
Loss functions for binary classification (target ): y Î{-1,+1}
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to learn a Predictor?
Function Class Loss Function Learning Algorithm
Decision Trees 0/1 loss ID3
Decision Trees Quadratic loss CART
Linear function Quadratic loss Least-squares regression
Linear function Logistic loss Logistic regression
Linear function Hinge loss Support Vector Machines
Layered Generalized
Linear function
Logistic loss Neural Networks
(Binary Classification)
Layered Generalized
Linear function
Quadratic loss Neural Networks
(Regression)
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to learn a Predictor?
• Theoretical Risk:
• Empirical Risk:
Average over all possible data
Average over training data
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to learn a Predictor?
• Prediction depends on Predictor with model
parameters w
• Minimize Risk w.r.t. those model parameters w⇒ mathematical Optimisation Problem
• Gradient-based first or second-order methods
• Coordinate-descent methods
• (Greedy) Search
y(x)^ fw
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to generalize to new data?
Err
or
Model Complexity
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to generalize to new data?
• Empirical Risk:
• Structural Risk: Regularizer
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• What is Machine Learning and why do we need it?
• Model Building
• Model Evaluation & Tuning
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Performance for Binary Classification
Total number of
data points (N)
True Target
positive negative
Predicted
Target
positiveTrue
Positive
False
Positive
negativeFalse
Negative
True
Negative
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Performance for Binary Classification
• Accuracy:
• Recall (true positive rate):
• Precision:
• Fall-out (false positive rate):
TP+TN
NTP
TP+ FNTP
TP+ FPFP
TN + FP
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Performance for Binary Classification
Decision function
AUC
(Area Under roc Curve)
y(x) = sign( fw(x)+b)^
Predictor Decision threshold
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Training vs. Test Performance
How do we know that a Predictor works well on new data?
Small error on training
data ≠ small error on
new data (test data)!
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hold-out Evaluation
• Put some data aside before training = test data
• Use this hold-out data for evaluation
• Disadvantages:
• What if we were (un)lucky when choosing the hold-out data?
• We do NOT use all the data for model training!
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
K-Fold Cross Validation-based Evaluation
• Split data into K partitions (folds)
• Take all but one partition to train a Predictor
• Evaluate Predictor on the left-out partition
• Repeat this for all partitions
• Average performance for all K evaluations
• Finally train a Predictor on all data
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Model Tuning
Learning methods and Predictors have hyper-parameters
• Amount of regularization
• Choice of loss function
• Decision threshold score
• Learning rate
• …
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example: Decision threshold
Decision threshold
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to choose hyper-parameters?
Grid Search:
• Evaluate Predictor for all grid points (hyper-parameter
combinations)
• Take best grid point
Very expensive!
210 010 210
12
02
12
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to choose hyper-parameters?
Bayesian Optimisation:
• Learn model to predict evaluation outcomes
• Evaluate Predictor only for promising grid points
• Take best grid point
after fixed number of
evaluations
210 010 210
12
02
12
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Common Pitfalls
• Model tuning is part of training
⇒ Do NOT use test data or test CV partitions!
• Use proper grid resolution and axis scaling
• Use same metric for tuning as for evaluation
Thank you!
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.