Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008

Radosław WesołowskiRadosław WesołowskiTomasz PękalskiTomasz Pękalski, Michal Borkowicz , Maciej Kopaczyński, Michal Borkowicz , Maciej Kopaczyński

12-03-200812-03-2008

What is it anyway?What is it anyway?

Decision tree Decision tree T T – a – a tree with a root (in tree with a root (in graph theory sense), graph theory sense), in which we assignin which we assign the following the following meanings to its meanings to its elements:elements:- inner nodes represent attributesinner nodes represent attributes,,- edges represent values of the edges represent values of the attributeattribute,,- leafs represent classification leafs represent classification decisionsdecisions..

Using decision tree we can Using decision tree we can visualize a program with only visualize a program with only ‘‘if-if-thenthen’’ instructions. instructions.

Testing functionsTesting functions

Let us consider an attribute A (e.g. Let us consider an attribute A (e.g. temperature). Let Vtemperature). Let VAA mean the set mean the set of all possible values of A (0K up to of all possible values of A (0K up to infinity). Let Rinfinity). Let Rtt mean the set of all mean the set of all possible test results (hot, mild, possible test results (hot, mild, cold). As a testing function we cold). As a testing function we mean a map mean a map

t: Vt: VAARRtt

We distinguish two main types of We distinguish two main types of testing functions, depending on the testing functions, depending on the set Vset VA A - discrete and - discrete and continuous.continuous.

Quality of a decision treeQuality of a decision tree (Occam's (Occam's razor)razor)::- we prefer smallwe prefer small, simple, simple trees, trees,- we want to gain maximum we want to gain maximum accuracy of classification (training accuracy of classification (training set, test set)set, test set)

For example:For example:

Q(T) = Q(T) = *size(T) + *size(T) + *accuracy(T)*accuracy(T)

Optimal tree – we are given:Optimal tree – we are given:- a training set S,a training set S,- a testing functions set TEST,a testing functions set TEST,- quality criterion Q.quality criterion Q.

Target: T optimising Q(T).Target: T optimising Q(T).

Fact: usually this is NP-hard Fact: usually this is NP-hard problem.problem.

Conclusion: we have to use Conclusion: we have to use heuristics.heuristics.

Building a decision tree:Building a decision tree:- top_down method:top_down method:

a. In the beginning the root a. In the beginning the root includes includes all training examplesall training examples

b. We divide them recursively, b. We divide them recursively, choosing one attribute at a timechoosing one attribute at a time

- bottom_up: we remove subtrees - bottom_up: we remove subtrees or edges or edges to gain precision for to gain precision for judging new judging new cases.cases.

Entropy – average bits amount to Entropy – average bits amount to represent a decision d for a represent a decision d for a randomly chosen object from a randomly chosen object from a given set S. Why? Because optimal given set S. Why? Because optimal binary representation assigns –binary representation assigns –log2(p) bits to a decision which log2(p) bits to a decision which probability is p. We have probability is p. We have formula:formula:

entropy(p1,...pn)= - p1*log2(p1) - ... - entropy(p1,...pn)= - p1*log2(p1) - ... - pn*log2(pn) pn*log2(pn)

Information gain:Information gain:gain(.) = info before dividing – info after gain(.) = info before dividing – info after

dividingdividing

Overtraining: We say that a model H Overtraining: We say that a model H overfits if there is a model H’ such overfits if there is a model H’ such that :that :- training_error(H) < training_error(H) < training_error(H’),training_error(H’),- testing_error(H) > testing_error(H) > testingtesting__error(H’).error(H’).

Avoiding overtraining: Avoiding overtraining: - adequate stop criterions,adequate stop criterions,- posprunning,posprunning,- preprunning.preprunning.

Some decision trees algorithms:Some decision trees algorithms:

- R1,R1,- ID3 (ID3 (Interactive dichotomizer version 3),),- C4.5 (C4.5 (ID3 + discretization + prunning),- CART (CART (Classification and Regression Trees),- CHAID (CHAID (CHi-squared Automatic Interaction

-Detection).).

Documents

Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008