31
Decision Tree Learning Richard McAllister Outline Overview Tree Construction Case Study: Determinants of House Price Decision Tree Learning Richard McAllister February 4, 2008 1 / 31

Decision Tree Learning - Montana State Univ · Decision Tree Learning Richard McAllister Outline Overview Tree Construction Case Study: Determinants of House Price Capabilities and

  • Upload
    lyminh

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Decision Tree Learning

Richard McAllister

February 4, 2008

1 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

1 Overview

2 Tree Construction

3 Case Study: Determinants of House Price

2 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Decision Tree Learning

Widely UsedUsed for approximating discrete-valued functionsRobust to noisy dataCapable of learning disjunctive expressions

3 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Algorithms

ID3AssistantC4.5

Uses information entropy / information gain

4 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Inductive Bias

Preference for small trees over largeOccam’s Razor: Prefer the simplest hypothesis that fitsthe data.

5 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Tree

Outlook

Humidity

No

High

Yes

Normal

Sunny

Yes

Overcast

Wind

No

Strong

Yes

Weak

Rain

Leftmost Branch:〈Outlook = Sunny , Temperature = Hot , Humidity =High, Wind = Strong〉

6 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Disjunction of Conjunctions

(Outlook = Sunny ∧ Humidity = Normal) ∨ (Outlook =Overcast) ∨ (Outlook = Rain ∧Wind = Weak)

7 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Appropriate Problems

Instances represented by attribute-value pairsTarget function has discrete output values (e.g. yes orno)Disjunctive descriptions may be requiredTraining data may contain errorsTraining data may contain missing attribute values

8 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

For Example...

Classifying medical patients by diseaseLoan applicationsEquipment malfunctionsClassification problems

9 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

ID3

Learns by constructing decision trees top-down"What should be tested first?"Greedy

Descendants are then chosen from possible valuesresulting from a decision option

10 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

How do you choose what’s next?

Information GainMeasures how well a given attribute separates thetraining examples according to their targetclassifications.

11 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Information Gain

EntropyCharacterizes the (im)purity of an arbitrary collection ofexamplesMeasures expected reduction in entropy caused byknowing the value of AEntropy(s) ≡ −p⊕log2p⊕ − plog2p

S : the collection of positive and negative examples ofsome target conceptp⊕ : the proportion of positive examples in Sp : the proportion of negative examples in S

incidentally - GeneralEntropy(S) ≡c∑

i=1

−pi log2pi

12 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Information Gain (Cont’d)

Information Gain

Gain(S, A) ≡ Entropy(S) =∑

v∈Values(a)

| Sv || S |

Entropy(Sv )

S : the collection of positive and negative examples ofsome target conceptA : a particular attributeSv : the collection of positive an negative examples ofthe same target concept after A has partitioned S.Sv = {s ∈ S | A(s) = v}

13 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Use of Information Gain

Information Gain is calculated for each attribute:Gain(S, Outlook) = 0.246Gain(S, Humidity) = 0.151Gain(S, Wind) = 0.048Gain(S, Temperature) = 0.029

Maximum is chosen at each level

14 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Continues until...

1 Every attribute has already been included along thetree’s path.

2 Training examples associated with leaf nodes all havethe same target attribute value (i.e. Entropy is zero).

15 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Hypothesis Space Search in Decision Tree Learning

Search proceeds by hill-climbing.Information gain measure guides hill-climbing.

16 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Capabilities and Limitations

ID3’s hypothesis space is a complete space of finite,discrete-valued functions.Maintains only one current hypothesis as it searchesthrough the tree.No backtracking - can reach a local optimum that is notglobal.Uses all training examples at each step in it’sstatistics-based decisions.

17 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Issues

Overfitting the DataIncorporating Continuous-Valued AttributesAlternative Measures for Selecting AttributesHandling Training Examples with Missing AttributeValuesHandling Attributes with Differing Costs

18 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Overfitting the Data

Overfitting: when some other hypothesis that fits thetraining examples less well actually performs betterover the entire distribution of instances.Addition of incorrect training data will cause ID3 tocreate a more complex tree.Addition of noise can also cause overfitting.Prevention:

Reduced error pruning: making a node out of a subtree,using most common classification of the trainingexamples to characterize it.Rule post-pruning: convert the tree to rules, then prunerules created by the decision tree by removingpreconditions that result in improving the rule’sestimated accuracy.

19 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Issues

Overfitting the DataIncorporating Continuous-Valued AttributesAlternative Measures for Selecting AttributesHandling Training Examples with Missing AttributeValuesHandling Attributes with Differing Costs

20 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Incorporating Continuous-Valued Attributes

Discretizing: dynamically defining new discrete-valuedattributes that partition the continuous attribute valueinto a discrete set of intervals.

21 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Issues

Overfitting the DataIncorporating Continuous-Valued AttributesAlternative Measures for Selecting AttributesHandling Training Examples with Missing AttributeValuesHandling Attributes with Differing Costs

22 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Alternative Measures for Selecting Attributes

SplitInformation(S, A): how broadly and uniformly theattribute A splits the data.Distance-based measures

23 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Issues

Overfitting the DataIncorporating Continuous-Valued AttributesAlternative Measures for Selecting AttributesHandling Training Examples with Missing AttributeValuesHandling Attributes with Differing Costs

24 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Handling Training Examples with Missing Attribute Values

Assign the missing attribute the most common amongthe training examples.Assign a probability to each of the possible values.Estimate the value of the missing attribute based on theobserved frequencies of the values among the trainingexamples.

25 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Issues

Overfitting the DataIncorporating Continuous-Valued AttributesAlternative Measures for Selecting AttributesHandling Training Examples with Missing AttributeValuesHandling Attributes with Differing Costs

26 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Handling Attributes with Differing Costs

Introduce a cost term into the attribute selectionmeasure.Example:

Gain2(S,A)Cost(A)

27 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Case Study: Determinants of House Price

The housing purchasing decision depends on householdpreference or priority orderings for dwelling attributes orservices and that, therefore, the attributes or servicesprobably have different importance in the determination ofhousing prices.

28 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Decision Tree Application

CART AlgorithmMakes use of an impurity measureImpurity:

low value: indicates the predomination of a single classwithin a set of caseshigh value: indicates the even distribution of classes

29 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Gini Impurity Measure

IG(i) = 1−m∑

j=1

f (i , j)2 =∑j 6=k

f (i , j)f (i , k)

f (i , j) : the probability of getting value j in node i

Gini impurity is based on squared probabilities ofmembership for each target category in the node. Itreaches its minimum (zero) when all cases in the nodefall into a single target category.

30 / 31

Decision TreeLearning

RichardMcAllister

Outline

Overview

TreeConstruction

Case Study:Determinantsof HousePrice

Figure 1. Decision tree results for the HDB flats

2308

GANG-ZHIFAN

ETAL.

31 / 31