Chapter 9 Classification and regression treescontents.kocw.net › KOCW › document › 2014 › hanyang › huhseon › 6… · 2016-09-09 · Classification and regression trees

2014-09-21

1

Han

yang

Un

iversity

Quest L

ab.

Chapter 9

Classification and regression trees

Fall, 2014Department of IMEHanyang University

Hanyang UniversityQuest Lab.

9.2 Classification trees

• Data-driven method for classification (classification tree) and prediction

(regression tree)

• Tree method developed by Breiman et al. → CART

• Obs are separated into subgroups based on predictors

• Goal: Classify or predict an outcome based on a set of predictors

• The output is a set of rules

• Recursive partitioning and pruning with consideration of the

homogeneity

2014-09-21

2


Example:

• Goal: classify a record as “will accept credit card offer” or “will not

accept”

• Rule might be “IF (Income > 92.5) AND (Education < 1.5) AND (Family

<= 2.5) THEN Class = 0 (nonacceptor)

• Also called CART, Decision Trees, or just Trees

• Rules are represented by tree diagrams

9.1 Introduction


9.1 Introduction

Number in the circle node: splitting value

Terminal (square) node: acceptor(1) or nonacceptor(0)

Number on the fork: number of records

Predictor

2014-09-21

3


9.1 Introduction

“IF (Income > 92.5) AND (Education < 1.5) AND (Family <= 2.5) THEN Class = 0 (nonacceptor)”



• Divide the p-dimensional space into two by a split value of such

that: 1, … , : and 1, … , :

• Measure how “pure” or homogeneous each of the resulting portions are

(“Pure” = containing records of mostly one class)

• Algorithm tries different values of ,and

to maximize purity in split

• After you get a “maximum purity” split, repeat the process for a second

split, and so on

• Repeat until we get “pure” classes (belonging to one class)

Recursive partitioning

2014-09-21

4



12 owner and 12 non-owners

of riding mowers

Income(x1) and lot size(x2)

Example 1: Riding mowers

Income Lot_Size Ownership60.0 18.4 owner85.5 16.8 owner64.8 21.6 owner61.5 20.8 owner87.0 23.6 owner110.1 19.2 owner108.0 17.6 owner82.8 22.4 owner69.0 20.0 owner93.0 20.8 owner51.0 22.0 owner81.0 20.0 owner75.0 19.6 non-owner52.8 20.8 non-owner64.8 17.2 non-owner43.2 20.4 non-owner84.0 17.6 non-owner49.2 17.6 non-owner59.4 16.0 non-owner66.0 18.4 non-owner47.4 16.4 non-owner33.0 18.8 non-owner51.0 14.0 non-owner63.0 14.8 non-owner


The first split: Lot Size = 19,000

2014-09-21

5


First Split – The Tree

“Is Lot_Size ≤ 19,000?”

Yes No


9.3 Measure of impurity

• Assume there are m classes

= rectangle A에서 class 에 속하는 obs의 비율

Gini impurity index :

• If all obs belong to same class, 0(why?)

• If all are equally distributed, 1 / .

• In two-class case, is maximized at 0.5.

2

1

( ) 1 ,m

kk

I A p

0 ( ) ( 1) /I A m m

2014-09-21

6



• Assume there are m classes

= rectangle A에서 class 에 속하는 obs의 비율

Entropy measure :

• If all obs belong to same class, Entropy 0(why?)

• If all are equally distributed, Entropy log2 .

• In two-class case, Entropy(A) is maximized at 0.5.

21

Entropy( ) log ( ),m

k kk

A p p

20 Entropy( ) log ( )A m



Node im

purity

Note that impurity measures are at their peaks when p=0.5 (when the data contain 50% of each of the two classes).

P = 0.5

2014-09-21

7



In Example 1,

Before split (12 owner and 12 nonowner): Gini=0.5, Entropy=1

After split (x2 ≥ 19 와 x2 < 19):

• Upper rectangle(9 owners and 3 nonowners): Gini=0.375, Entropy=0.811

• Lower rectangle(3 owners and 9 nonowners): Gini=0.375, Entropy=0.811

• Combined impurity (weighted average): Gini=0.375, Entropy=0.811

Both impurity indices decrease after split.

Second Split: Income = $84,750


Third Split: Income = $57,150

2014-09-21

8

Tree after three splits


After All Splits (until when?)


2014-09-21

9



Decision node: circle 안의 수는

splitting value

Fork: left는 splitting value보다

작거나 같은 쪽, right는 큰 쪽,

숫자는 record의 개수

Leaf: final rectangle

split variable은 circle 아래 표기

Tree after all splits



Determining Leaf Node Label

• Each leaf node label is determined by “voting” of the records within it,

and by the cutoff value

• Records within each leaf node are from the training data

• Default cutoff=0.5 means that the leaf node’s label is the majority

class

• Cutoff = 0.75: requires majority of 75% or more “1” records in the leaf

to label it a “1” node

2014-09-21

10


Then at a node, how to determine split value of ??

① 질문 생성

② 각 질문 별로 information gain (IG) 계산

③ IG가 가장 큰 것을 선택

When stop?

① Impurity = 0일 때

② Leaf에서의 record 개수가 어느 정도 이하일 때

③ IG가 임계값 이하일 때

④ Pruning



Then at a node, how to determine split value of ??

① 질문 생성

② 각 질문 별로 information gain (IG) 계산 후 IG가 가장 큰 것을 선택

IG = Reduction in entropy caused by partitioning the data according to this

variable.

Ex. Suppose a node S is a collection of 14 examples [9 yes’, 5 no’s].


2

2 2 21

9 9 5 5[9 ,5 ] log log log 0.94

14 14 14 14i ii

Entropy y n p p

2014-09-21

11



( )

( , ) ( ) ( )i

vi v

v value x

SIG S x Entropy S Entropy S

S

• value (A): the set of all possible values for a variable A.

• Sv: the subset of S for which variable A has value v.



Outlook Temperature Humidity Windy Play?

sunny hot high false No

sunny hot high true No

overcast hot high false Yes

rain mild high false Yes

rain cool normal false Yes

rain cool normal true No

overcast cool normal true Yes

sunny mild high false No

sunny cool normal false Yes

rain mild normal false Yes

sunny mild normal true Yes

overcast mild high true Yes

overcast hot normal false Yes

rain mild high true No

( ) ,

[9 ,5 ]

[6 ,2 ]

[3 ,3 ]

false

true

Values windy false true

S y n

S y n

S y n

{ , }

( , ) ( ) ( )

8 6( ) ( ) ( )

14 140.048

vv

v false true

false true

SIG S windy Entropy S Entropy S

S

Entropy S Entropy S Entropy S

Similarly, IG(S,Humidity)=0.151.

Humidity provides greater information gain than Windy.

variable

2014-09-21

12


9.4 Evaluating the performance of a classification tree

Example 2. Acceptance of personal loan

• Previous campaign을 분석하여 better campaign 기획

• What combination of factors make customers convert to personal loan?

• Predictors: demographic info(age, income, etc.), response to last campaign, r

elationship with bank(mortgage, security account, etc.)

• Last campaign – 480 customers accepted out of 5,000 customers (9.6%)



2,500 training set, 1,500 validation set, 1,000 test sets

2014-09-21

13





100% accurate in training set - full-grown tree에서 leaf는 pure class

“overfitting의 문제”

2014-09-21

14



• Validation set과 test set

에서는 100%가 아님

• Full-grown 이전에 stop

또는 full-grown tree의

pruning이 필요


“Overfitting”: because of too small number of observations

9.5 Avoiding Overfitting

→ stop tree growth (CHAID), or pruning tree back (CART, C4.5)

2014-09-21

15


9.5 Avoiding Overfitting

• Stop when?

tree depth? min number of obs? min reduction in impurity? etc.

- CHAID(chi-squared automatic interaction detection):

Splitting stops when purity improvement is not statistically significant

Stopping tree growth: CHAID


9.5 Avoiding overfitting

- Lets tree grow to full extent, then prunes it back

- Riding mower 예제에서 마지막 몇 개 split에 의한 rectangle은

single record만 가짐 → 이들은 noise를 포함하고 있다고 볼 수 있음

- Pruning이란 decision node를 leaf로 바꾸는 것

- Unseen data의 error rate가 증가하는 시점을 발견하는 것이 목표

Pruning the tree

2014-09-21

16



Use cost complexity(CC) of a tree : ,

where

=tree 에서의 misclassification에 의한 cost

=number of leaves in

=penalty factor (set by user)

(Tree size가 커질수록 에러는 작아지지만 대신 페널티가 커진다)

( 값이 커질 수록 큰 tree의 CC가 높아진다)

• Among trees of given size(# of decision nodes), choose the one with

lowest CC

Pruning the tree (CART)



• Do this for each size of tree

→각 size 별로 최소 CC를 가지는 tree들을 얻음

• 이들 tree 중에서 validation set의 misclassification error가 minimum인

것(“minimum error tree”)을 선택 하거나

• 또는 minimum error tree의 one standard error 범위내의 tree 중 제일

작은 것을 선택(“best pruned tree”)

• 이는 Sampling error를 고려한 수정임 (validation data도 pruning 과정

에서 사용했으며, 다른 sample일 경우 달라질 수도 있었기 때문)

(1 )p pp

n

• Do this for each size of tree

→각 size 별로 최소 CC를 가지는 tree들을 얻음

• 이들 tree 중에서 validation set의 misclassification error가 minimum인

것(“minimum error tree”)을 선택 하거나

• 또는 minimum error tree의 one standard error 범위내의 tree 중 제일

작은 것을 선택(“best pruned tree”)

• 이는 Sampling error를 고려한 수정임 (validation data도 pruning 과정

에서 사용했으며, 다른 sample일 경우 달라질 수도 있었기 때문)

(1 ),

p pp

n

n은 validation set의 샘플수

2014-09-21

17



full grown tree

0.0147 (1 0.0147)0.0147 0.0178 1.78%

1500

. . . . . . . . . . . .

minimum error tree

best pruned tree


9.6 Classification rules from trees

Tree를 바탕으로 classification 쉽게 함

Ex. Rule might be “IF (Income > 92.5) AND (Education < 1.5) AND

(Family <= 2.5) THEN Class = 0 (nonacceptor)

• 두 개 이상의 class가 있

는 경우로 확장 가능

• Leaf node 는 m 개의

class 중 하나의 label

2014-09-21

18


9.8 Regression trees

• Regression tree - Used with continuous outcome variable• Procedure similar to classification tree• Many splits attempted, choose the one that minimizes impurity

Ex. Predicting Toyota Corolla (see Ch 6): 10 predictors, 600 training set

Best tree: only two

predictors are useful –

Age and Horse Power.


9.8 Regression trees

Cf. Classification tree: leaf의 class는 그 leaf에 있는 data를 “vote”해서 결정

Regression tree: leaf node의 value는 그 leaf에 있는 data의 average

Prediction

Typical measure: sum of squared deviation from mean of the leaf

Measuring impurity

RMSE (root-mean-squared-error), lift charts, etc.

same way as other predictive methods

Evaluating performance

2014-09-21

19


• Good classifier, useful for variable selection

• No need to transform variables

• Automatically selection of variables by means of split (pruning에 의해 일부

variable만 선택 가능)

• Outliers에 대해 robust (split은 record의 value의 order에만 관련이 있고, 크기

와는 무관)

• But sensitive to the change in the data (slight change can cause very

different split)

• Predictor와 response 간에 특별한 relationship을 미리 가정하지 않음

(cf. linear regression은 linear relationship을 미리 가정)

9.9 Advantages, weaknesses, and extensions


• Split가 one predictor에 의해서만 이루어지므로 predictor 간 상호관계 고려하

지 못함

• horizontal and vertical splitting이 적절하지 않은 경우에는 performance 저하

• Discriminant analysis (Ch 12) is preferable

9.9 Advantages, weaknesses, and extensions

2014-09-21

20


• 9.1 (eBay.com)

• 9.2 (Flight delay)

• 9.3 (Regression tree – Toyota Corolla)

Ch. 9 Problems

Documents

Chapter 9 Classification and regression treescontents.kocw.net › KOCW › document › 2014 › hanyang › huhseon › 6… · 2016-09-09 · Classification and regression trees