Upload
ashwani-kumar
View
216
Download
0
Embed Size (px)
Citation preview
7/27/2019 Cake Talk Probability Forecasting
1/46
Reliable Probability
Forecasting a Machine
Learning PerspectiveDavid Lindsay
Supervisors: Zhiyuan Luo, AlexGammerman, Volodya Vovk
7/27/2019 Cake Talk Probability Forecasting
2/46
Overview
What is probability forecasting? Reliabilityand resolution criteria
Experimental design
Problems with traditional assessment methods:square loss, log loss and ROC curves
Probability Calibration Graph (PCG)
Traditional learners are unreliable yet accurate!
Extension ofVenn Probability Machine (VPM)
Which learners are reliable?
Psychological and theoretical viewpoint
7/27/2019 Cake Talk Probability Forecasting
3/46
Probability Forecasting
Qualified predictions important in many
applications (especially medicine).
Most machine learning algorithms makebare predictions.
Those that do make qualified predictions
make no claims of how effective themeasures are!
7/27/2019 Cake Talk Probability Forecasting
4/46
Probability Forecasting: Generalisation
of Pattern Recognition Goal of pattern recognition = find the best label
for each new test object.
Example Abdominal Pain Dataset:
Training Set to learn from
Label
Diagnosisi
y
Object
Patient
Details
ixName: David
Sex: M
Height: 62
Appendicitis
Name: Daniil
Sex: M
Height: 64
Dyspepsia
Name: Mark
Sex: M
Height: 61
Non-specific
,...,Name: Sian
Sex: F
Height: 58
Dyspepsia
, ,Name: Wilma
Sex: F
Height: 56
?
Test Object,
what is thetrue label?
True label
unknown or
withheldfromlearner
7/27/2019 Cake Talk Probability Forecasting
5/46
Probability Forecasting: Generalisation
of Pattern Recognition Probability forecast estimate the conditional probability
of a label given an observed object( | ) P
r( | )P y x y x
learner
Training
setName: Helen
Sex: FHeight: 56
Name: Helen
Sex: FHeight: 56
Name: Helen
Sex: FHeight: 56
Name: Helen
Sex: FHeight: 56
Test
object
?
Name: HelenSex: F
Height: 56
(Dyspepsia | )P = 0.1Name: Helen
Sex: FHeight: 56
(Appendicitis | )P = 0.7Name: Helen
Sex: FHeight: 56
= 0.2 (Non spec | )P
Name: Helen
Sex: FHeight: 56
etc
We want learner to estimate probabilities forallpossible class labels:
7/27/2019 Cake Talk Probability Forecasting
6/46
Probability forecasting more
formally X object space, Y label space, Z = X Y example space
Our learner makes probability forecasts forallpossible
labels
1 2 1 1 1 1 1 1 1 , , , , ( 1| ), ( 2 | ), , ( | )n n n n n n n nz z z x P y x P y x P y x Y
1 1 |argmaxn ni
y P i x
Y
Use probability forecasts to predict label most likely label:
7/27/2019 Cake Talk Probability Forecasting
7/46
Back to the plan
What is probability forecasting? Reliabilityand resolution criteria
Experimental design
Problems with traditional assessment methods:square loss, log loss and ROC curves
Probability Calibration Graph (PCG)
Traditional learners are unreliable yet accurate!
Extension ofVenn Probability Machine (VPM)
Which learners are reliable?
Psychological and theoretical viewpoint
7/27/2019 Cake Talk Probability Forecasting
8/46
Studies of Probability Forecasting
Probability forecasting is well studied area since1970s: Psychology
StatisticsMeteorology
These studies assessed two criteria ofprobability forecasts:
Reliability = the probability forecasts should not lie Resolution = the probability forecasts are practically
useful
7/27/2019 Cake Talk Probability Forecasting
9/46
When an event is predicted with probability
should have approx chance of being
incorrect
Reliabilityp
1 p
a.k.a. well calibrated, Considered an asymptotic property.
Dawid (1985) proved no deterministic learner
can be reliable for all data still interesting toinvestigate
This property is often overlooked in practical
studies!
7/27/2019 Cake Talk Probability Forecasting
10/46
Resolution
Probability forecasts are practically useful,
e.g. they can be used to rank the labels in
order of likelihood! Closely related to classification accuracy-
common focus of machine learning.
Separate from reliability, i.e. do not gohand in hand (Lindsay, 2004)
7/27/2019 Cake Talk Probability Forecasting
11/46
Back to the plan
What is probability forecasting? Reliabilityand resolution criteria
Experimental design
Problems with traditional assessment methods:square loss, log loss and ROC curves
Probability Calibration Graph (PCG)
Traditional learners are unreliable yet accurate!
Extension ofVenn Probability Machine (VPM)
Which learners are reliable?
Psychological and theoretical viewpoint
7/27/2019 Cake Talk Probability Forecasting
12/46
Experimental design
Tested several learners on many datasets
in the online setting:
ZeroR = Control
K-Nearest Neighbour
Neural Network
C4.5 Decision Tree
Nave Bayes
Venn Probability Machine Meta Learner (see
later)
7/27/2019 Cake Talk Probability Forecasting
13/46
The Online Learning Setting
2 7 6 1 7 ? ?
2 7 6 1 7 2 ?
Before
After
Update training data
for learning machine
for next trial
Learning machinemakes prediction
for new example.
(label withheld)
Repeat process
for all examples
7/27/2019 Cake Talk Probability Forecasting
14/46
Lots of benchmark data Tested on data available from the UCI Machine Learning
repository:
Abdom inal Pain:6387 examples, 135 features, 9 classes,Noisy
Diabetes:768 examples, 8 features, 2 classes
Heart-Statlog:270 examples, 13 features, 2 classes
Wiscons in Breast Cancer:685 examples, 10 features, 2classes
American Votes:435 examples, 16 features, 2 classes
Lymphography :148 examples, 18 features, 4 classes
Credit Card Appl icat ions :690 examples, 15 features, 2classes
Ir is Flower:150 examples, 4 features, 3 classes
And many more
7/27/2019 Cake Talk Probability Forecasting
15/46
Programs
Extended the WEKA data mining system
implemented in Java:
Added VPM meta learner to existing library ofalgorithms
Allow learners to be tested in online setting.
Created Matlab scripts to easily createplots (see later)
7/27/2019 Cake Talk Probability Forecasting
16/46
Results, papers and website All results that I discuss today can be found in my
3 tech reports: The Probability Calibration Graph - a useful
visualisation of the reliability of probability forecasts,
Lindsay (2004), CLRC-TR-04-01Multi-class probability forecasting using the Venn
Probability Machine - a comparison with traditionalmachine learning methods, Lindsay (2004), CLRC-TR-04-02
Rapid implementation of Venn Probability Machines,Lindsay (2004), CLRC-TR-04-03
And on my web site: http://www.david-lindsay.co.uk/research.html
7/27/2019 Cake Talk Probability Forecasting
17/46
Back to the plan
What is probability forecasting? Reliabilityand resolution criteria
Experimental design
Problems with traditional assessment methods:square loss, log loss and ROC curves
Probability Calibration Graph (PCG)
Traditional learners are unreliable yet accurate!
Extension ofVenn Probability Machine (VPM)
Which learners are reliable?
Psychological and theoretical viewpoint
7/27/2019 Cake Talk Probability Forecasting
18/46
Loss Functions
2
1 1
( ),
n
j i
s ni jy j
i
I p
Y
Square loss
,1 1
( ) logi
n
i jy ji j
l n I p
Y
Log loss
There are many other possible loss functions
Degroot and Feinberg (1982) showed that all loss
functions measure a mixture ofreliabilityand resolution
Log loss punishes more harshly: forced to spread its
bets
7/27/2019 Cake Talk Probability Forecasting
19/46
ROC Curves
Nave Bayes on the Abdominal pain data set
1. Graph shows trade off
between false and true
positive predictions
2. Want curve to be as
close to the upper left
corneras possible
(away from diagonal)
3. My results show that
this graph tests
resolution.
4. Area under curve
provides measure of
quality of probability
forecasts.
7/27/2019 Cake Talk Probability Forecasting
20/46
7/27/2019 Cake Talk Probability Forecasting
21/46
Problems with Traditional
Assessment Loss functions and ROC give more information
than error rate about the quality of probability
forecasts.
But
loss functions = mixture of resolution and reliability
ROC curve = measures resolution
Dont have any method ofsolelyassessingreliability
Dont have method of telling if probability
forecasts are over- or under- estimated
7/27/2019 Cake Talk Probability Forecasting
22/46
Back to the plan
What is probability forecasting? Reliabilityand resolution criteria
Experimental design
Problems with traditional assessment methods:square loss, log loss and ROC curves
Probability Calibration Graph (PCG)
Traditional learners are unreliable yet accurate!
Extension ofVenn Probability Machine (VPM)
Which learners are reliable?
Psychological and theoretical viewpoint
7/27/2019 Cake Talk Probability Forecasting
23/46
Inspiration for PCG (Meteorology)
Murphy & Winkler (1977)
Calibration data for
precipitation forecasts
Reliable points lie
close to diagonal
7/27/2019 Cake Talk Probability Forecasting
24/46
A PCG plot of ZeroR on Abdominal Pain
Predicted Probability
Empiricalfrequencyofbeingco
rrect
Line ofcalibration
PCG
coordinates
Reliability PCG coordinates lie close to line of calibration
i.e. ZeroR may is not accurate but it is reliable!
Plot may not
span wholeaxis ZeroR
makes no
predictions with
high probability
7/27/2019 Cake Talk Probability Forecasting
25/46
PCG a visualisation tool and measure of reliability
Total 2764.5
Mean 0.0483
Standard Deviation 0.0757
Max 0.4203
Min 4.9e-17
Nave Bayes VPM Nave Bayes
VPM is reliable as PCG follows the
diagonal!
Total 496.7
Mean 0.0087
Standard Deviation 0.0112
Max 0.1017
Min 9.2e-8
Over and under estimates its
probabilities much like real doctors!
Unreliable, forecast of 0.9 only has 0.55
chance being right! (over estimate)
Unreliable, forecast of 0.1 only has 0.3
chance being right! (under estimate)
7/27/2019 Cake Talk Probability Forecasting
26/46
Learners predicting like people!
Nave Bayes People
Lots of psychological research people make unreliable
probability forecasts
7/27/2019 Cake Talk Probability Forecasting
27/46
Back to the plan
What is probability forecasting? Reliabilityand resolution criteria
Experimental design
Problems with traditional assessment methods:square loss, log loss and ROC curves
Probability Calibration Graph (PCG)
Traditional learners are unreliable yet accurate!
Extension ofVenn Probability Machine (VPM)
Which learners are reliable?
Psychological and theoretical viewpoint
7/27/2019 Cake Talk Probability Forecasting
28/46
Table comparing scores with PCG
838.1 (4)0.76 (1)0.8 (4)0.54 (5)40.7 (8)VPM C4.5
2764.5 (7)0.72 (5)1.3 (7)0.50 (4)29.2 (2)Nave Bayes
496.7 (1)0.75 (2)0.6 (1)0.44 (1)28.9 (1)VPM Nave Bayes
5062.9 (11)0.54 (10)2.6 (10)1.0 (11)33.4 (4)10-NN
4492.7 (10)0.55 (9)2.2 (9)0.96 (10)33.4 (4)20-NN
3481.2 (8)0.57 (8)3.3 (11)0.67 (7)39.6 (7)C4.5
1320.5 (6)0.75 (3)0.72 (2)0.45 (2)30.5 (3)Neural Net
921.2 (5)0.74 (4)0.73 (3)0.47 (3)34.3 (5)30-NN
554.6 (2)0.61 (6)0.9 (5)0.58 (6)41.6 (9)VPM 1-NN
4307.5 (9)0.59 (7)2.1 (8)0.73 (8)34.6 (6)1-NN
678.6 (3)0.49 (11)1.1 (6)0.74 (9)55.6 (10)ZeroR
PCGROC
Area
Log
Loss
Sqr
Loss
ErrorAlgorithm
7/27/2019 Cake Talk Probability Forecasting
29/46
Correlations of scores
Inverse No-0.1ROC vs. Sqr
Reliability
Direct Weak0.26PCG vs. Error
Direct No0.04PCG vs. Sqr
Resolution
Direct Strong0.76PCG vs. Sqr
Reliability
InterpretationCorr. Coeff.Scores
Inverse Moderate-0.52ROC vs. Error
Direct Strong0.67ROC vs. Sqr
Resolution
7/27/2019 Cake Talk Probability Forecasting
30/46
Back to the plan
What is probability forecasting? Reliabilityand resolution criteria
Experimental design
Problems with traditional assessment methods:square loss, log loss and ROC curves
Probability Calibration Graph (PCG)
Traditional learners are unreliable yet accurate!
Extension ofVenn Probability Machine (VPM)
Which learners are reliable?
Psychological and theoretical viewpoint
7/27/2019 Cake Talk Probability Forecasting
31/46
What is the VPM meta-learner?
Volodyas VPM1. Predicts a label
2. Produces upperu and lowerlbounds forpredicted label only
My VPM extension1. Extracts more information
2. Produces probability forecast forallpossible labels
3. Predicts a label using these probability forecasts.
4. Produces Volodyas bounds as well!
Learner
VPM meta
learning
framework
VPM sits on top of existing learner to complement
predictions with probability estimates
7/27/2019 Cake Talk Probability Forecasting
32/46
Volodyas original use of VPM
Online Trial Number
Errorrate
andbounds
22.1%1414.1Low Error
28.9%1835Error
34.7%2216.5Up Error
Upper (red) andlower (green)
bounds lie above
and belowthe
actual number of
errors (black)
made on the
data.
7/27/2019 Cake Talk Probability Forecasting
33/46
Output from VPM compared with
that of original underlying learner
Key: Predicted = underlined , Actual =
NANA7.6e-
9
6.3e-
10
4.0e-112.2e-91.3e-
9
0.071.7e-
13
2.9e-90.935831
NANA2.2e-
4
2.2e-
7
0.20.460.162.3e-
5
0.170.019.4e-52490
NANA1.3e-44.1e-103.4e-34.2e-30.994.4e-53.3e-64.5e-63.08e-91653
Nave Bayes
LowUpDysp.Renal.
PancrIntest
obstr
CholiNon.
Spec
Perf.
Pept.
Div.Appx
BoundsProbability forecast for each class labelTrial#
0.410.680.010.010.00.010.010.420.00.010.535831
0.070.710.40.090.080.150.050.070.100.030.022490
0.080.820.090.010.040.00.730.080.030.00.031653
VPM Nave Bayes
7/27/2019 Cake Talk Probability Forecasting
34/46
Back to the plan
What is probability forecasting? Reliabilityand resolution criteria
Experimental design
Problems with traditional assessment methods:square loss, log loss and ROC curves
Probability Calibration Graph (PCG)
Traditional learners are unreliable yet accurate!
Extension ofVenn Probability Machine (VPM)
Which learners are reliable?
Psychological and theoretical viewpoint
7/27/2019 Cake Talk Probability Forecasting
35/46
ZeroR
Heart Disease Lymphography Diabetes
ZeroR outputs probability forecasts which are mere label
frequencies
ZeroR predicts the majority class labelat each trial.
Uses no information about the objects in its learning the
simplest of all learners.
Accuracy is poor, but reliability is good.
7/27/2019 Cake Talk Probability Forecasting
36/46
K-NN
10-NN 20-NN 30-NN
K-NN finds subset of K closest (nearest neighbouring)
examples in training data using a distance metric. Then
counts the label frequencies amongst this subset.
Acts like a more sophisticated version of ZeroR that usesinformation held in the object.
Appropriate choice of K must be made to obtain reliable
probability forecasts (depends on data).
Traditional Learners and VPM
7/27/2019 Cake Talk Probability Forecasting
37/46
Traditional Learners and VPM Traditional learners can be very unreliable (yet accurate) - depends on
data.
My research shows empirically that VPM is reliable.
And it can recalibrate a learners original probability forecasts to make themmore reliable!
Improvement in reliability often without detrimentto classification accuracy.
Nave Bayes
VPM Nave Bayes
C4.5
VPM C4.5
Neural Net
VPM Neural Net
1-NN
VPM 1-NN
7/27/2019 Cake Talk Probability Forecasting
38/46
Back to the plan
What is probability forecasting? Reliabilityand resolution criteria
Experimental design
Problems with traditional assessment methods:square loss, log loss and ROC curves
Probability Calibration Graph (PCG)
Traditional learners are unreliable yet accurate!
Extension ofVenn Probability Machine (VPM)
Which learners are reliable?
Psychological and theoretical viewpoint
7/27/2019 Cake Talk Probability Forecasting
39/46
Psychological Heuristics
When faced with the difficult task of judgingprobability, people employ a limited number ofheuristics which reduce the judgements tosimpler ones:
Availability - An event is predicted more likely tooccur if it has occurred frequently in the past
Representativeness - One compares the essentialfeatures of the event to those of the structure ofprevious events
Simulation - The ease in which the simulation of asystem of events reaches a particular state can beused to judge the propensity of the (real) system toproduce that state.
7/27/2019 Cake Talk Probability Forecasting
40/46
Interpretation of reliable learners
using heuristics ZeroR, K-NN and VPM learners are
reliable probability forecasters.
Can identify heuristics in these learningalgorithms
Remember psychological research states:
More heuristics More reliable forecasts
7/27/2019 Cake Talk Probability Forecasting
41/46
Psychological Interpretation of
ZeroR The simplest of all reliable probability
forecasters uses 1 heuristic:
The learner merely counts labels it hasobserved so far, and uses the frequencies of
labels as its forecasts (Availability)
7/27/2019 Cake Talk Probability Forecasting
42/46
Psychological Interpretation of
K-NN More sophisticated than the ZeroR
learner, the K-NN learner uses 2
heuristics:Uses the distance metric to find subset of K
closest examples in training set.
(Representativeness)
Then counts the label frequencies in the
subset of K-nearest neighbours to makes its
forecasts (Availability)
7/27/2019 Cake Talk Probability Forecasting
43/46
Psychological Interpretation of
VPM Even more sophisticated the VPM meta-learner uses all 3 heuristics:
The VPM tries each new test example with all
possible classifications (Simulation)
Then under each tentative simulation clusters
training examples which are similar into
groups (Representativeness)Finally the VPM calculates the frequency of
labels in each of these groups to make its
forecasts (Availability)
7/27/2019 Cake Talk Probability Forecasting
44/46
Theoretical justifications
ZeroR can be proven to be asymptotically
reliable (but experiments show well in
finite data) K-NN has lots of theory Stone (1977) to
support its convergence to true probability
distribution VPM has a lots of theoretical justification
for finite data using martingales
7/27/2019 Cake Talk Probability Forecasting
45/46
Take home points Probability forecasting is useful for real life
applications especially medicine.
Want learners to be reliable and accurate.
PCG can be used to check reliability.
ZeroR, K-NN and VPM provide consistentlyreliable probability forecasts.
Traditional learners Nave Bayes, Neural Net
and Decision Tree can provide unreliableforecasts.
VPM can be used to improve reliability ofprobability forecasts without detriment to
classification accuracy.
7/27/2019 Cake Talk Probability Forecasting
46/46
SupervisionAlex Gammerman
Volodya Vovk
Zhiyuan Luo
Mathematical AdviceDaniil Riabko
Volodya Vovk
Teo Sharia
ProofreadingZhiyuan Luo
Sin Cox
Graphics & DesignSin Cox
CateringSin Cox
Fin Acknowledgments