26
1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro AI Decision Trees

1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

Embed Size (px)

Citation preview

Page 1: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

1

Decision Trees

Greg Grudic

(Notes borrowed from Thomas G. Dietterich and Tom Mitchell)

Modified by Longin Jan Latecki

Some slides by Piyush Rai

Intro AI Decision Trees

Page 2: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

2

Outline

• Decision Tree Representations– ID3 and C4.5 learning algorithms (Quinlan

1986)– CART learning algorithm (Breiman et al. 1985)

• Entropy, Information Gain

• Overfitting

Intro AI Decision Trees

Page 3: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

3

Training Data Example: Goal is to Predict When This Player Will Play Tennis?

Intro AI Decision Trees

Page 4: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

4Intro AI Decision Trees

Page 5: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

5Intro AI Decision Trees

Page 6: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

6Intro AI Decision Trees

Page 7: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

7Intro AI Decision Trees

Page 8: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

8

( ) ( ){ }1 1, ,..., ,N NS y y= x x

Learning Algorithm for Decision Trees

( ){ }

1,...,

, 0,1

d

j

x x

x y

=

Î

x

What happens if features are not binary? What about regression?Intro AI Decision Trees

Page 9: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

9

Choosing the Best Attribute

Intro AI Decision Trees

- Many different frameworks for choosing BEST have been proposed! - We will look at Entropy Gain.

Number +and – examplesbefore and after

a split.

A1 and A2 are “attributes” (i.e. features or inputs).

Page 10: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

10

Entropy

Intro AI Decision Trees

Page 11: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

11Intro AI Decision Trees

Entropy is like a measure of impurity…

Page 12: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

12

Entropy

Intro AI Decision Trees

Page 13: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

Intro AI Decision Trees 13

Page 14: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

14

Information Gain

Intro AI Decision Trees

Page 15: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

Intro AI Decision Trees 15

Page 16: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

Intro AI Decision Trees 16

Page 17: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

Intro AI Decision Trees 17

Page 18: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

18

Training Example

Intro AI Decision Trees

Page 19: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

19

Selecting the Next Attribute

Intro AI Decision Trees

Page 20: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

20Intro AI Decision Trees

Page 21: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

21

Non-Boolean Features

• Features with multiple discrete values– Multi-way splits– Test for one value versus the rest– Group values into disjoint sets

• Real-valued features– Use thresholds

• Regression– Splits based on mean squared error metric

Intro AI Decision Trees

Page 22: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

22

Hypothesis Space Search

Intro AI Decision Trees

You do not get the globally optimal tree!- Search space is exponential.

Page 23: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

23

Overfitting

Intro AI Decision Trees

Page 24: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

24

Overfitting in Decision Trees

Intro AI Decision Trees

Page 25: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

25

Validation Data is Used to Control Overfitting

• Prune tree to reduce error on validation set

Intro AI Decision Trees

Page 26: 1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro

Homework

• Which feature will be at the root node of the decision tree trained for the following data? In other words which attribute makes a person most attractive?

Intro AI Decision Trees 26

Height Hair Eyes Attractive?

small blonde brown No

tall dark brown No

tall blonde blue Yes

tall dark Blue No

small dark Blue No

tall red Blue Yes

tall blonde brown No

small blonde blue Yes