View
213
Download
0
Embed Size (px)
Citation preview
Decision Tree
Rong Jin
Determine Milage Per Gallonmpg cylinders displacement horsepower weight acceleration modelyear maker
good 4 low low low high 75to78 asiabad 6 medium medium medium medium 70to74 americabad 4 medium medium medium low 75to78 europebad 8 high high high low 70to74 americabad 6 medium medium medium medium 70to74 americabad 4 low medium low medium 70to74 asiabad 4 low medium low low 70to74 asiabad 8 high high high low 75to78 america: : : : : : : :: : : : : : : :: : : : : : : :bad 8 high high high low 70to74 americagood 8 high medium high high 79to83 americabad 8 high high high low 75to78 americagood 4 low low low low 79to83 americabad 6 medium medium medium high 75to78 americagood 4 medium low low low 79to83 americagood 4 low low medium high 79to83 americabad 8 high high high low 70to74 americagood 4 low medium low medium 75to78 europebad 5 medium medium medium medium 75to78 europe
A Decision Tree for Determining MPG
From slides of Andrew Moore
mpg cylinders displacementhorsepower weight acceleration modelyear maker
4 low low low high 75to78 asiagood
Decision Tree Learning Extremely popular method
Credit risk assessment Medical diagnosis Market analysis
Good at dealing with symbolic feature Easy to comprehend
Compared to logistic regression model and support vector machine
Representational Power Q: Can trees represent arbitrary Boolean
expressions?
Q: How many Boolean functions are there over N binary attributes?
How to Generate Trees from Training Data
A Simple Idea Enumerate all possible trees
Check how well each tree matches with the training data
Pick the one work best
Too many trees
Problems ?
How to determine the quality of
decision trees?
Solution: A Greedy Approach Choose the most informative feature Split data set Recursive until each data item is classified
correctly
How to Determine the Best Feature? Which feature is more
informative to MPG?
What metric should be used?
From Andrew Moore’s slides
Mutual Information !
Mutual Information for Selecting Best Features
,
( , )( ; ) ( , ) log
( ) ( )
: MPG (good or bad), : cylinder (3, 4, 6, 8)
x y
P x yI X Y P x y
P x P y
Y X
From Andrew Moore’s slides
Another Example: Playing Tennis
Example: Playing Tennis
Humidity
High Norm
(9+, 5-)
(3+, 4-) (6+, 1-)
( , ) ( , )( , ) log ( , ) log
( ) ( ) ( ) ( )
( , ) ( , )( , ) log ( , ) log
( ) ( ) ( ) ( )
0.151
hP h p P n p
I P h p P n pP h P p P n P p
P h p P n pP h p P n p
P h P p P n P p
Wind
Weak Strong
(9+, 5-)
(6+, 2-) (3+, 3-)
( , ) ( , )( , ) log ( , ) log
( ) ( ) ( ) ( )
( , ) ( , )( , ) log ( , ) log
( ) ( ) ( ) ( )
0.048
wP w p P s p
I P w p P s pP w P p P s P p
P w p P s pP w p P s p
P w P p P s P p
Predication for Nodes
From Andrew Moore’s slides
What is the predication for each node?
Predication for Nodes
Recursively Growing Trees
OriginalDataset
Partition it accordingto the value of the attribute we split on
cylinders = 4
cylinders = 5
cylinders = 6
cylinders = 8
From Andrew Moore slides
Recursively Growing Trees
cylinders = 4
cylinders = 5
cylinders = 6
cylinders = 8
Build tree fromThese records..
Build tree fromThese records..
Build tree fromThese records..
Build tree fromThese records..
From Andrew Moore slides
A Two Level Tree
Recursively growing trees
When should We Stop Growing Trees?
Should we split this node ?
Base Cases Base Case One: If all records in current data subset have the
same output then don’t recurse Base Case Two: If all records have exactly the same set of
input attributes then don’t recurse
Base Cases: An idea Base Case One: If all records in current data subset have the
same output then don’t recurse Base Case Two: If all records have exactly the same set of
input attributes then don’t recurse
Proposed Base Case 3:
If all attributes have zero information gain then don’t recurse
Is this a good idea?
Old Topic: Overfitting
What should We do ?
Pruning
Pruning Decision Tree Stop growing trees in time Build the full decision tree as before. But when you can grow it no more, start to
prune: Reduced error pruning Rule post-pruning
Reduced Error Pruning Split data into training and validation set Build a full decision tree over the training set Keep removing node that maximally increases
validation set accuracy
Original Decision Tree
Pruned Decision Tree
Reduced Error Pruning
Rule Post-Pruning Convert tree into rules Prune rules by removing the preconditions Sort final rules by their estimated accuracy
Most widely used method (e.g., C4.5)
Other methods: statistical significance test (chi-square)
Real Value Inputs What should we do to deal with real value inputs?
mpg cylinders displacementhorsepower weight acceleration modelyear maker
good 4 97 75 2265 18.2 77 asiabad 6 199 90 2648 15 70 americabad 4 121 110 2600 12.8 77 europebad 8 350 175 4100 13 73 americabad 6 198 95 3102 16.5 74 americabad 4 108 94 2379 16.5 73 asiabad 4 113 95 2228 14 71 asiabad 8 302 139 3570 12.8 78 america: : : : : : : :: : : : : : : :: : : : : : : :good 4 120 79 2625 18.6 82 americabad 8 455 225 4425 10 70 americagood 4 107 86 2464 15.5 76 europebad 5 131 103 2830 15.9 78 europe
Information Gain x: a real value input t: split value Find the split value t such that the mutual
information I(x, y: t) between x and the class label y is maximized.
Conclusions Decision trees are the single most popular data
mining tool Easy to understand Easy to implement Easy to use Computationally cheap
It’s possible to get in trouble with overfitting They do classification: predict a categorical output
from categorical and/or real inputs
Software Most widely used decision tree C4.5 (or C5.0)
http://www2.cs.uregina.ca/~hamilton/courses/831/notes/ml/dtrees/c4.5/tutorial.html
Source code, tutorial
The End