16
Radosław Wesołowski Radosław Wesołowski Tomasz Pękalski Tomasz Pękalski, Michal Borkowicz , Maciej Kopaczyński , Michal Borkowicz , Maciej Kopaczyński 12-03-2008 12-03-2008

Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008

Radosław WesołowskiRadosław WesołowskiTomasz PękalskiTomasz Pękalski, Michal Borkowicz , Maciej Kopaczyński, Michal Borkowicz , Maciej Kopaczyński

12-03-200812-03-2008

Page 2: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008

What is it anyway?What is it anyway?

Decision tree Decision tree T T – a – a tree with a root (in tree with a root (in graph theory sense), graph theory sense), in which we assignin which we assign the following the following meanings to its meanings to its elements:elements:- inner nodes represent attributesinner nodes represent attributes,,- edges represent values of the edges represent values of the attributeattribute,,- leafs represent classification leafs represent classification decisionsdecisions..

Using decision tree we can Using decision tree we can visualize a program with only visualize a program with only ‘‘if-if-thenthen’’ instructions. instructions.

Page 3: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008
Page 4: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008
Page 5: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008

Testing functionsTesting functions

Let us consider an attribute A (e.g. Let us consider an attribute A (e.g. temperature). Let Vtemperature). Let VAA mean the set mean the set of all possible values of A (0K up to of all possible values of A (0K up to infinity). Let Rinfinity). Let Rtt mean the set of all mean the set of all possible test results (hot, mild, possible test results (hot, mild, cold). As a testing function we cold). As a testing function we mean a map mean a map

t: Vt: VAARRtt

We distinguish two main types of We distinguish two main types of testing functions, depending on the testing functions, depending on the set Vset VA A - discrete and - discrete and continuous.continuous.

Page 6: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008

Quality of a decision treeQuality of a decision tree (Occam's (Occam's razor)razor)::- we prefer smallwe prefer small, simple, simple trees, trees,- we want to gain maximum we want to gain maximum accuracy of classification (training accuracy of classification (training set, test set)set, test set)

For example:For example:

Q(T) = Q(T) = *size(T) + *size(T) + *accuracy(T)*accuracy(T)

Page 7: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008

Optimal tree – we are given:Optimal tree – we are given:- a training set S,a training set S,- a testing functions set TEST,a testing functions set TEST,- quality criterion Q.quality criterion Q.

Target: T optimising Q(T).Target: T optimising Q(T).

Fact: usually this is NP-hard Fact: usually this is NP-hard problem.problem.

Conclusion: we have to use Conclusion: we have to use heuristics.heuristics.

Page 8: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008

Building a decision tree:Building a decision tree:- top_down method:top_down method:

a. In the beginning the root a. In the beginning the root includes includes all training examplesall training examples

b. We divide them recursively, b. We divide them recursively, choosing one attribute at a timechoosing one attribute at a time

- bottom_up: we remove subtrees - bottom_up: we remove subtrees or edges or edges to gain precision for to gain precision for judging new judging new cases.cases.

Page 9: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008
Page 10: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008

Entropy – average bits amount to Entropy – average bits amount to represent a decision d for a represent a decision d for a randomly chosen object from a randomly chosen object from a given set S. Why? Because optimal given set S. Why? Because optimal binary representation assigns –binary representation assigns –log2(p) bits to a decision which log2(p) bits to a decision which probability is p. We have probability is p. We have formula:formula:

entropy(p1,...pn)= - p1*log2(p1) - ... - entropy(p1,...pn)= - p1*log2(p1) - ... - pn*log2(pn) pn*log2(pn)

Page 11: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008
Page 12: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008
Page 13: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008

Information gain:Information gain:gain(.) = info before dividing – info after gain(.) = info before dividing – info after

dividingdividing

Page 14: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008
Page 15: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008

Overtraining: We say that a model H Overtraining: We say that a model H overfits if there is a model H’ such overfits if there is a model H’ such that :that :- training_error(H) < training_error(H) < training_error(H’),training_error(H’),- testing_error(H) > testing_error(H) > testingtesting__error(H’).error(H’).

Avoiding overtraining: Avoiding overtraining: - adequate stop criterions,adequate stop criterions,- posprunning,posprunning,- preprunning.preprunning.

Page 16: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008

Some decision trees algorithms:Some decision trees algorithms:

- R1,R1,- ID3 (ID3 (Interactive dichotomizer version 3),),- C4.5 (C4.5 (ID3 + discretization + prunning),- CART (CART (Classification and Regression Trees),- CHAID (CHAID (CHi-squared Automatic Interaction

-Detection).).