Upload
shauna
View
41
Download
0
Embed Size (px)
DESCRIPTION
Data Mining and Machine Learning Decision Trees and ID3. David Corne, [email protected]. Decision Trees. Real world applications of DTs. See here for a list: http://www.cbcb.umd.edu/~salzberg/docs/murthy_thesis/survey/node32.html - PowerPoint PPT Presentation
Citation preview
From Heather’s blog:http://www.prettystrongmedicine.com/p/about.html
Decision Trees
Real world applications of DTs
See here for a list: http://www.cbcb.umd.edu/~salzberg/docs/murthy_thesis/survey/node32.html
Includes: Agriculture, Astronomy, Biomedical Engineering, Control Systems, Financial analysis, Manufacturing and Production, Medicine, Molecular biology, Object recognition, Pharmacology, Physics, Plant diseases, Power systems, Remote Sensing, Software development, Text processing:
Field names
Field names
Field values
Field names
Field values
Class values
Why decision trees?
Popular, since they are interpretable
... and correspond to human reasoning/thinking about decision-making
Can perform quite well in accuracy when compared with other approaches
... and there are good algorithms to learn decision trees from data
Figure 1. Binary Strategy as a tree model.
Mohammed MA, Rudge G, Wood G, Smith G, et al. (2012) Which Is More Useful in Predicting Hospital Mortality -Dichotomised Blood Test Results or Actual Test Values? A Retrospective Study in Two Hospitals. PLoS ONE 7(10): e46860. doi:10.1371/journal.pone.0046860http://www.plosone.org/article/info:doi/10.1371/journal.pone.0046860
We will learn the ‘classic’ algorithm to learn a DT from categorical data:
We will learn the ‘classic’ algorithm to learn a DT from categorical data:
ID3
Suppose we want a tree that helps us predict someone’s politics, given their
gender, age, and wealth
gender age wealth politicsmale middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
Choose a start node (field) at randomgender age wealth politics
male middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
Choose a start node (field) at random
?
gender age wealth politics
male middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
Choose a start node (field) at random
Age
gender age wealth politics
male middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
Add branches for each value of this field
Ageyoung
mid
old
gender age wealth politics
male middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
Check to see what has filtered down
Ageyoung
mid
old
1 L, 2 R 1 L, 1 R 0 L, 1 R
gender age wealth politics
male middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
Where possible, assign a class value
Ageyoung
mid
old
1 L, 2 R 1 L, 1 R 0 L, 1 R
Right-Wing
gender age wealth politics
male middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
Otherwise, we need to add further nodes
Ageyoung
mid
old
1 L, 2 R 1 L, 1 R 0 L, 1 R
? ? Right-Wing
gender age wealth politics
male middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
Repeat this process every time we need a new node
Ageyoung
mid
old
1 L, 2 R 1 L, 1 R 0 L, 1 R
? ? Right-Wing
gender age wealth politics
male middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
Starting with first new node – choose field at random
Ageyoung
mid
old
1 L, 2 R 1 L, 1 R 0 L, 1 R
wealth ? Right-Wing
gender age wealth politics
male middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
Check the classes of the data at this node…
Ageyoung
mid
old
1 L, 2 R 1 L, 1 R 0 L, 1 R
wealth ? Right-Wingrich
poor1 L, 0 R
1 L, 1 R
gender age wealth politics
male middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
And so on …
Ageyoung
mid
old
1 L, 2 R 1 L, 1 R 0 L, 1 R
wealth ? Right-Wingrich
poor
1 L, 1 RRight-wing
gender age wealth politics
male middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
But we can do better than randomly chosen fields!gender age wealth politics
male middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
This is the tree we get if first choice is `gender’gender age wealth politics
male middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
gendermale female
Right-Wing Left-Wing
gender age wealth politics
male middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
This is the tree we get if first choice is `gender’
Algorithms for building decision trees (of this type)
Initialise: tree T contains one ‘unexpanded’ node Repeat until no unexpanded nodes remove an unexpanded node U from T expand U by choosing a field add the resulting nodes to T
Algorithms for building decision trees (of this type) – expanding a node
?
Algorithms for building decision trees (of this type) – the essential step
Field
? ? ?
Value = XValue = Y
Value = Z
So, which field?
Field
? ? ?
Value = XValue = Y
Value = Z
Three choices: gender, age, or wealthgender age wealth politics
male middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
Suppose we choose age(table now sorted by age values)
gender age wealth politicsmale middle-aged rich Right-wing
female middle-aged poor Left-wing
male old poor Right-wing
male young rich Right-wing
female young poor Left-wing
male young poor Right-wing
Two of the values have a mixture of classes
Suppose we choose wealth(table now sorted by wealth values)
gender age wealth politicsfemale middle-aged poor Left-wing
male old poor Right-wingfemale young poor Left-wing
male young poor Right-wing
male middle-aged rich Right-wing
male young rich Right-wing
One of the values has a mixture of classes - this choice is a bit less mixed up than age?
Suppose we choose gender(table now sorted by gender values)
gender age wealth politicsfemale middle-aged poor Left-wing
female young poor Left-wingmale old poor Right-wing
male middle-aged rich Right-wing
male young poor Right-wing
male young rich Right-wing
The classes are not mixed up at all within the values
So, at each step where we choose a node to expand, we
make the choice where the relationship between the field values and the class values is
least mixed up
Measuring ‘mixed-up’ness: Shannon’s entropy measure
Suppose you have a bag of N discrete things,and there T different types of things.
Where, pT is the proportion of things in thebag that are type T, the entropy of the bag is:
T
TT pp )log(
Examples:
This mixture: { left left left right right }has entropy: − ( 0.6 log(0.6) + 0.4 log(0.4)) = 0.292
This mixture: { A A A A A A A A B C }has entropy: − ( 0.8 log(0.8) + 0.1 log(0.1) + 0.1 log(0.1)) =0.278
This mixture: {same same same same same same}has entropy: − ( 1.0 log(1.0) ) = 0
Lower entropy = less mixed up
T
TT pp )log(
ID3 chooses fields based on entropy
Field1 Field2 Field3 … val1 val1 val1 val2 val2 val2 val3 val3
Each val has an entropy value – how mixed up the classes are for that value choice
ID3 chooses fields based on entropy
Field1 Field2 Field3 … val1xp1 val1xp1 val1xp1 val2xp2 val2xp2 val2xp2 val3xp3 val3xp3
Each val has an entropy value – how mixed up the classes are for that value choiceAnd each val also has a proportion – how much of the data at this node has this val
ID3 chooses fields based on entropy
Field1 Field2 Field3 … val1xp1 val1xp1 val1xp1 val2xp2 val2xp2 val2xp2 val3xp3 val3xp3 = = =H(D|Field1) H(D|Field2) H(D|Field3)
So ID3 works out H(D|Field) for each field, which is the entropies of the valuesweighted by the proportions.
ID3 chooses fields based on entropy
Field1 Field2 Field3 … val1xp1 val1xp1 val1xp1 val2xp2 val2xp2 val2xp2 val3xp3 val3xp3 = = =H(D|Field1) H(D|Field2) H(D|Field3)
So ID3 works out H(D|Field) for each field, which is the entropies of the valuesweighted by the proportions.
The one with the lowest value is chosen – this maximises ‘Information Gain’
Back here gender, age, or wealthgender age wealth politics
male middle-aged rich Right-wing
male young rich Right-wing
female young poor Left-wing
female middle-aged poor Left-wing
male young poor Right-wing
male old poor Right-wing
Suppose we choose age(table now sorted by age values)
gender age wealth politicsmale middle-aged rich Right-wing
female middle-aged poor Left-wing
male old poor Right-wing
male young rich Right-wing
female young poor Left-wing
male young poor Right-wing
H(D| age) = proportion-weighted entropy = 0.3333 x − ( 0.5 x log(0.5) + 0.5 x log(0.5) )+ 0.1666 x − ( 1 x log(1) )+ x − ( 0.33 x log(0.33) + 0.66 xlog(0.66) )
0.33330.16666
0.5
Suppose we choose wealth(table now sorted by wealth values)
gender age wealth politicsfemale middle-aged poor Left-wing
male old poor Right-wingfemale young poor Left-wing
male young poor Right-wing
male middle-aged rich Right-wing
male young rich Right-wing
H(D|wealth) =
0.3333 x − ( 0.5 x log(0.5) + 0.5 x log(0.5) )+ x − ( 1 x log(1) )
0.6666
0.3333
Suppose we choose gender(table now sorted by gender values)
gender age wealth politicsfemale middle-aged poor Left-wing
female young poor Left-wingmale old poor Right-wing
male middle-aged rich Right-wing
male young poor Right-wing
male young rich Right-wing
H(D| gender) = 0.3333 x − ( 1 x log (1) )+ x − ( 1 x log (1) )
0.33330.6666
This is the one we would choose ...
Alternatives to Information Gain- all, somehow or other, give a
measure of mixed-upnessand have been used in building DTs
• Chi Square• Gain Ratio, • Symmetric Gain Ratio, • Gini index • Modified Gini index • Symmetric Gini index• J-Measure • Minimum Description Length, • Relevance • RELIEF • Weight of Evidence
Decision Trees
Further reading is on google
Interesting topics in context are:
Pruning: close a branch down before
you hit 0 entropy ( why?)
Discretization and regression: trees that
deal with real valued fields
Decision Forests: what do you think
these are?