Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
1 © 2013 The MathWorks, Inc.
Machine Learning with MATLAB
Jasmina Lazić, BSc, PhD
Application Engineer
2
What You Will Learn
Overview of machine learning
Algorithms available with MATLAB
– Classification
– Data mining of large datasets
– Clustering
MATLAB as an interactive environment
for evaluating and choosing the best algorithm
3
Machine Learning Characteristics and Examples
Characteristics
– Lots of data (many variables)
– System too complex to know
the governing equation (e.g., black-box modeling)
Examples
– Pattern recognition (speech, images)
– Financial algorithms (credit scoring, algo trading)
– Energy forecasting (load, price)
– Biology (tumor detection, drug discovery)
93.68%
2.44%
0.14%
0.03%
0.03%
0.00%
0.00%
0.00%
5.55%
92.60%
4.18%
0.23%
0.12%
0.00%
0.00%
0.00%
0.59%
4.03%
91.02%
7.49%
0.73%
0.11%
0.00%
0.00%
0.18%
0.73%
3.90%
87.86%
8.27%
0.82%
0.37%
0.00%
0.00%
0.15%
0.60%
3.78%
86.74%
9.64%
1.84%
0.00%
0.00%
0.00%
0.08%
0.39%
3.28%
85.37%
6.24%
0.00%
0.00%
0.00%
0.00%
0.06%
0.18%
2.41%
81.88%
0.00%
0.00%
0.06%
0.08%
0.16%
0.64%
1.64%
9.67%
100.00%
AAA AA A BBB BB B CCC D
AAA
AA
A
BBB
BB
B
CCC
D
4
Challenges – Machine Learning
Significant technical expertise required
No “one size fits all” solution
Locked into Black Box solutions
Time required to conduct the analysis
5
Overview – Machine Learning
Machine
Learning
Supervised
Learning
Classification
Regression
Unsupervised
Learning Clustering
Group and interpret data based only
on input data
Develop predictive model based on both input and output data
Type of Learning Categories of Algorithms
6
Supervised Learning
Regression
Non-linear Reg.
(GLM, Logistic)
Linear
Regression Decision Trees
Ensemble
Methods
Neural
Networks
Classification
Nearest
Neighbor
Discriminant
Analysis Naive Bayes
Support Vector
Machines
7
Unsupervised Learning
Clustering
k-Means,
Fuzzy C-Means
Hierarchical
Neural
Networks
Gaussian
Mixture
Hidden Markov
Model
8
Supervised Learning - Workflow
Known data
Known responses
Model
Train the Model
Model
New Data
Predicted
Responses
Use for Prediction
Measure Accuracy
Select Model
Import Data
Explore Data
Data
Prepare Data
Speed up Computations
9
Demo – Bank Marketing Campaign
Goal:
– Predict if customer would subscribe to
bank term deposit based on different
attributes
Approach:
– Train a classifier using different models
– Measure accuracy and compare models
– Reduce model complexity
– Use classifier for prediction
Data set downloaded from UCI Machine Learning repository
http://archive.ics.uci.edu/ml/datasets/Bank+Marketing
0
10
20
30
40
50
60
70
80
90
100
Per
cent
age
Bank Marketing Campaign
Misclassification Rate
Neu
ral N
et
Logi
stic R
egre
ssio
n Dis
crim
inan
t Ana
lysis
k-ne
ares
t Nei
ghbo
rs
Nai
ve B
ayes
Sup
port
VM
Dec
isio
n Tr
ees
Tree
Bagg
er
Red
uced
TB
No
Misclassified
Yes
Misclassified
10
Artificial Neural Networks
Another way to do data-driven modeling
Building blocks are neurons
– Simple transfer function
– Weights and biases to offset and scale the inputs
Neurons grouped into layers
Layers grouped into networks
Modify the weights and biases (training) to model any
regression or classification problem
11
Generalized Linear Models
Extends the linear model
– Define relationship between model and response variable
– Model error distributions other than normal
Logistic regression
– Response variable is binary (true / false)
– Results are typically expressed as an odd’s ratio
Poisson regression
– Model count data (non-negative integers)
– Response variable comes from a Poisson distribution
12
Decision Trees
For use when relationship
between predictors and
response is unknown
Set of if-then statements
that lead to a prediction
Can be used in aggregate
to improve predictive accuracy
Bagging (bootstrap aggregation)
uses many trees created from
resampling data set
1 if Age<21 then node 2 elseif Age>=21
then node 3 else 35.7928
2 if Cement<354.5 then node 4 elseif
Cement>=354.5 then node 5 else 23.3874
3 if Cement<355.95 then node 6 elseif
Cement>=355.95 then node 7 else 41.3278
4 fit = 18.3255
5 fit = 34.9657
13
Support Vector Machines
Find a boundary between the classes that maximises
the margin (the minimum distance of the boundary to
any of the points)
14
Unsupervised Learning
Clustering
k-Means,
Fuzzy C-Means
Hierarchical
Neural
Networks
Gaussian
Mixture
Hidden Markov
Model
15
Clustering Overview
What is clustering?
– Segment data into groups,
based on data similarity
Why use clustering?
– Identify outliers
– Resulting groups may be
the matter of interest
How is clustering done?
– Can be achieved by various algorithms
– It is an iterative process (involving trial and error)
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
16
Machine Learning with MATLAB
Interactive environment
– Visual tools for exploratory data analysis
– Easy to evaluate and choose best algorithm
– Apps available to help you get started (e.g,. neural network tool, curve fitting tool)
Multiple algorithms to choose from
– Clustering
– Classification
– Regression
17
Learn More : Machine Learning with MATLAB
Classification
with MATLAB
Credit Risk Modeling with
MATLAB
Multivariate Classification in
the Life Sciences
Electricity Load and
Price Forecasting
http://www.mathworks.com/discovery/
machine-learning.html
Data Driven Fitting
with MATLAB Regression
with MATLAB
18
Questions?