22
1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI Decision Trees

1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

Embed Size (px)

Citation preview

Page 1: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

1

CSCI 3202: Introduction to AI

Decision Trees

Greg Grudic

(Notes borrowed from Thomas G. Dietterich and Tom Mitchell)

Intro AI Decision Trees

Page 2: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

2

Outline

• Decision Tree Representations– ID3 and C4.5 learning algorithms (Quinlan

1986)– CART learning algorithm (Breiman et al. 1985)

• Entropy, Information Gain

• Overfitting

Intro AI Decision Trees

Page 3: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

3

Training Data Example: Goal is to Predict When This Player Will Play Tennis?

Intro AI Decision Trees

Page 4: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

4Intro AI Decision Trees

Page 5: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

5Intro AI Decision Trees

Page 6: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

6Intro AI Decision Trees

Page 7: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

7Intro AI Decision Trees

Page 8: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

8

( ) ( ){ }1 1, ,..., ,N NS y y= x x

Learning Algorithm for Decision Trees

( ){ }

1,...,

, 0,1

d

j

x x

x y

=

Î

x

What happens if features are not binary? What about regression?Intro AI Decision Trees

Page 9: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

9

Choosing the Best Attribute

Intro AI Decision Trees

- Many different frameworks for choosing BEST have been proposed! - We will look at Entropy Gain.

Number +and – examplesbefore and after

a split.

A1 and A2 are “attributes” (i.e. features or inputs).

Page 10: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

10

Entropy

Intro AI Decision Trees

Page 11: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

11Intro AI Decision Trees

Entropy is like a measure of purity…

Page 12: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

12

Entropy

Intro AI Decision Trees

Page 13: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

13

Information Gain

Intro AI Decision Trees

Page 14: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

14

Training Example

Intro AI Decision Trees

Page 15: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

15

Selecting the Next Attribute

Intro AI Decision Trees

Page 16: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

16Intro AI Decision Trees

Page 17: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

17

Non-Boolean Features

• Features with multiple discrete values– Multi-way splits– Test for one value versus the rest– Group values into disjoint sets

• Real-valued features– Use thresholds

• Regression– Splits based on mean squared error metric

Intro AI Decision Trees

Page 18: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

18

Hypothesis Space Search

Intro AI Decision Trees

You do not get the globally optimal tree!- Search space is exponential.

Page 19: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

19

Overfitting

Intro AI Decision Trees

Page 20: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

20

Overfitting in Decision Trees

Intro AI Decision Trees

Page 21: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

21

Validation Data is Used to Control Overfitting

• Prune tree to reduce error on validation set

Intro AI Decision Trees

Page 22: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees

Quiz 7

1. What is overfitting?2. Are decision trees universal approximators?3. What do decision tree boundaries look like?4. Why are decision trees not globally optimal?5. Do Support Vector Machines give the globally optimal

solution (given the SVN optimization criteria and a specific kernel)?

6. Does the K Nearest Neighbor algorithm give the globally optimal solution if K is allowed to be as large as the number of training examples (N), N fold cross validation is used, and the distance metric is fixed?

Intro AI Decision Trees 22