34
COM24111: Machine Learning Decision Trees Gavin Brown www.cs.man.ac.uk/~gbrown

COM24111: Machine Learning Decision Trees Gavin Brown gbrown

Embed Size (px)

DESCRIPTION

Q. Where is a good threshold? Also known as “decision stump”

Citation preview

Page 1: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

COM24111: Machine Learning

Decision Trees

Gavin Brownwww.cs.man.ac.uk/~gbrown

Page 2: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Recap: threshold classifiers

height

weightt

dancer"" else player"" then)( if tweight

Page 3: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Q. Where is a good threshold?

10 20 3040 50 60

10

Also known as “decision stump”

Page 4: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

From Decision Stumps, to Decision Trees

- New type of non-linear model

- Copes naturally with continuous and categorical data

- Fast to both train and test (highly parallelizable)

- Generates a set of interpretable rules

Page 5: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Recap: Decision Stumps

10 20 3040 50 60

The stump “splits” the dataset.

Here we have 4 classification errors.

10.514.117.021.023.2

27.130.142.047.057.359.9

yesno

10

predict 0 predict 1

Page 6: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

A modified stump

10 20 3040 50 60

10.514.117.021.023.227.130.142.047.0

57.359.9

yesno

10

Here we have 3 classification errors.

Page 7: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Recursion…

10.514.117.021.023.2

27.130.142.047.057.359.9

yesno

Just another dataset!

Build a stump!

no yes no yes

Page 8: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Decision Trees = nested rules

yesno

no yes no yes

10 20 3040 50 60

if x>25 thenif x>50 then y=0 else y=1; endif

elseif x>16 then y=0 else y=1; endif

endif

Page 9: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Trees build “orthogonal” decision boundaries.

Boundary is piecewise, and at 90 degrees to feature axes.

Page 10: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

The most important concept in Machine Learning

Page 11: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Looks good so far…

The most important concept in Machine Learning

Page 12: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Looks good so far…

Oh no! Mistakes!What happened?

The most important concept in Machine Learning

Page 13: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Looks good so far…

Oh no! Mistakes!What happened?

We didn’t have all the data.

We can never assume that we do.

This is called “OVER-FITTING”to the small dataset.

The most important concept in Machine Learning

Page 14: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Number of possible paths down tree tells you the number of rules.More rules = more complicated.

Could have N rules, where N is the num of examples in training data.… the more rules… the more chance of OVERFITTING.

yesno

no yes no yes

Page 15: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

tree depth / length of rules

testing error

optimal depth about here

1 2 3 4 5 6 7 8 9

starting to overfit

Overfitting….

Page 16: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown
Page 17: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Take a short break.

Talk to your neighbours.

Make sure you understand the recursive algorithm.

Page 18: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

if x>25 thenif x>50 then y=0 else y=1; endif

elseif x>16 then y=0 else y=1; endif

endif

Decision trees can be seen as nested rules.Nested rules are FAST, and highly parallelizable.

x,y,z-coordinates per joint, ~60 totalx,y,z-velocities per joint, ~60 totaljoint angles (~35 total)joint angular velocities (~35 total)

Page 19: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Real-time human pose recognition in parts from single depth imagesComputer Vision and Pattern Recognition 2011Shotton et al, Microsoft Research

• Trees are the basis of Kinect controller

• Features are simple image properties.

• Test phase: 200 frames per sec on GPU

• Train phase more complex but still parallel

Page 20: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

10

0

5

0 5

10

0

5

We’ve been assuming continuous variables!

10 20 3040 50 60

Page 21: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

The Tennis Problem

Page 22: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Outlook

Humidity

HIGH

RAINSUNNYOVERCAST

NO

Wind

YES

WEAKSTRONG

NO

NORMAL

YES

YES

Page 23: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown
Page 24: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

The Tennis Problem

Note: 9 examples say “YES”, while 5 say “NO”.

Page 25: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Partitioning the data…

Page 26: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Thinking in Probabilities…

Page 27: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

The “Information” in a feature

H(X) = 1

More uncertainty = less information

Page 28: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

The “Information” in a feature

H(X) = 0.72193

Less uncertainty = more information

Page 29: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Entropy

Page 30: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Calculating Entropy

Page 31: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Information Gain, also known as “Mutual Information”

Page 32: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

maximum information gain

gain

Page 33: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Outlook

Temp Temp

Humidity Humidity

HIGH

RAINSUNNYOVERCAST

Temp

Humidity

MILD

Wind Wind

HIGH

YES

WEAK STRONG

NO

NORMAL

YES

WEAKSTRONG

NO

HOT

NO

COOL

Wind

YES

WEAKSTRONG

NO

Humidity

MILD HOT

Wind

HIGH

STRONG

YES

COOL

YES

MILD

HIGH

NO YES

NORMALNORMAL

NO

YES YES

COOL

NO

WEAK

YES

NORMAL

Page 34: COM24111: Machine Learning Decision Trees Gavin Brown  gbrown

Decision trees

Trees learnt by recursive algorithm, ID3 (but there are many variants of this)

Trees draw a particular type of decision boundary (look it up in these slides!)

Highly efficient, highly parallelizable, lots of possible “splitting” criteria.

Powerful methodology, used in lots of industry applications.