Upload
lewis-lucas
View
229
Download
1
Embed Size (px)
Citation preview
1
Formal Evaluation Techniques
Chapter 7
2
• test set error rates, confusion matrices, lift charts
• Focusing on formal evaluation methods for supervised learning and unsupervised clustering
3
7.1 What Should Be Evaluated?
1. Supervised Model
2. Training Data
3. Attributes
4. Model Builder
5. Parameters
6. Test Set Evaluation
4
ModelBuilder
SupervisedModel EvaluationData
Instances
Attributes
Parameters
Test Data
Training Data
5
Single-Valued Summary Statistics
• Mean
• Variance
• Standard deviation
7.2 Tools for Evaluation
6
-99 -3 -2 -1 0 1 2 3 99
13.54%
34.13%
2.14%
34.13%
13.54%
2.14%
.13%.13%
f(x)
x
The Normal Distribution
7
Normal Distributions and Sample Means
• A distribution of means taken from random sets of independent samples of equal size are distributed normally.
• Any sample mean will vary less than two standard errors from the population mean 95% of the time.
8
Computing the Standard Error
• The population variance is estimated by dividing the sample variance by the
sample size.
• The standard error is computed by taking the square root of the
estimated population variance.
9
Population
Sample 2
Sample 1
X2
X2
X10
X9
X8
X7
X6
X5
X4
X3
X1
X7
X4
X4
X9
X10
Sample 3
X4
X4
X10
10
A Classical Model for Hypothesis Testing
• Hypothesis: educated guess about the outcome of some event
• Experimental group, Control group
• Null hypothesis– There is no significant difference in the mean
increase or decrease of total allergic reactions per day between patients in the group receiving treatment X and patients in the group receiving the placebo.
11
A Classical Model for Hypothesis Testing
sizes. sampleingcorrespondareand
means; respectivetheforscoresvarianceareand
samples;tindependentheformeanssampleareand
and; score cesignifican theis
21
21
21
nn
XX
P
where
vv
)//( 2211
21
nvnv
XXP
To be 95% confident, P must >= 2
12
Table 7.1 • A Confusion Matrix for the Null Hypothesis
Computed Computed Accept Reject
Accept Null True Accept Type 1 ErrorHypothesis
Reject Null Type 2 Error True RejectHypothesis
13
7.3 Computing Test Set Confidence Intervals
instances set test of #
errors set test of # )( e Error RatClassifier E
14
Computing 95% Confidence Intervals
1. Given a test set sample S of size n and error rate E
2. Compute sample variance as V= E(1-E)
3. Compute the standard error (SE) as the square root of V divided by n.
4. Calculate an upper bound error as E + 2(SE)
5. Calculate a lower bound error as E - 2(SE)
15
Three general comments
• The rest data has been randomly chosen from the pool of all possible test set instances
• Test, training, and validation data must represent disjoint sets
• The instances in each class should be distributed in the training, validation, and test data as they are seen in the entire dataset
16
7.4 Comparing Supervised Learner Models
17
Comparing Models with Independent Test Data
where
E1 = The error rate for model M1
E2 = The error rate for model M2
q = (E1 + E2)/2
n1 = the number of instances in test set A
n2 = the number of instances in test set B
)2/11/1)(1(
21
nnqq
EEP
18
7.5 Attribute Evaluation
19
Locating Redundant Attributes with Excel
• Correlation Coefficient
• Positive Correlation• Negative Correlation• Curvilinear Relationship (curve line)
–Two attributes having a low r value may still have a curvilinear
20
Positive Correlation r=1
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9 10
Attribute A
Att
rib
ute
B
21
Negative Correlation r=-1
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
Attribute A
Att
rib
ute
B
22
Curvilinear Relationship r=0
0
5
10
15
20
25
30
0 2 4 6 8 10 12
Attribute A
Att
rib
ute
B
23
Creating a Scatterplot Diagram with MS Excel
24
Blood Pressure vs. Cholesterol
0
50
100
150
200
250
300
350
400
450
0 20 40 60 80 100 120 140 160 180 200
Blood Pressure
Ch
ole
ste
rol
25
Hypothesis Testing for Numerical Attribute Significance
jjii
ji
ininstancesofnumber theisand in instancesofnumber theis
. attributefor variancej class theand variancei class the
.attributeformeanjclass theis andmeaniclass theis i
where
CC
Aisis
Aj
XX
nn
vv
)//( jnjviniv
jXiX
ijP
26
Table 7.2 • Cardiology Patient Data: Numerical Attribute Significance
Class Class ESX Attribute Hypothesis Test Sick Healthy Significance for Significance
Age (Mean) 56.50 52.50 0.45 4.076 (Sd) 7.96 9.55
BP (Mean) 134.40 129.30 0.29 2.511 (Sd) 18.73 16.17
Chol (Mean) 251.09 242.23 0.17 1.495 (Sd) 49.46 53.55
MHR (Mean) 139.10 158.47 0.85 7.955 (Sd) 22.60 19.1
Peak (Mean) 1.59 0.58 0.86 8.001 (Sd) 1.30 0.78
27
7.6 Unsupervised Evaluation Techniques
• Unsupervised Clustering for Supervised Evaluation– If the instances cluster into the predefined classes contained in the training data, a supervised learner model built with the training data is likely to perform well.
• Supervised Evaluation for Unsupervised Clustering–Designate each formed cluster as a class–Build a supervised learner model by choosing a random sampling of instances from each class–Test the supervised learner model with the remaining instances
• Additional Methods
28
Additional Methods
• Designate all instances as training data
• Apply an alternative technique’s measure of cluster quality
• Create your own measure of cluster quality
• Perform a between-cluster attribute-value comparison.
29
7.7 Evaluating Supervised Models with Numeric Output
30
Mean Squared Error
where for the ith instance,
ai = actual output value
ci = computed output value
n
cacacacamse
2) ( ... )(... 2) ( 2) ( nni i2211
31
Mean Absolute Error
where for the ith instance,
ai = actual output value
ci = computed output value
n
cacacamae
| | .... | | | | nn2211
32
Table 7.3 • Absolute and Squared Error
Instance Life Ins. Promo. Computed Absolute SquaredNumber Actual Output Output Error Error
1 0.0 0.024 0.024 0.00052 1.0 0.998 0.002 0.00003 0.0 0.023 0.023 0.00054 1.0 0.986 0.014 0.00025 1.0 0.999 0.001 0.00006 0.0 0.050 0.050 0.00257 1.0 0.999 0.001 0.00008 0.0 0.262 0.262 0.06869 0.0 0.060 0.060 0.003610 1.0 0.997 0.003 0.000011 1.0 0.999 0.001 0.000012 1.0 0.776 0.224 0.050213 1.0 0.999 0.001 0.000014 0.0 0.023 0.023 0.000515 1.0 0.999 0.001 0.0000