Upload
rebeca-scarbro
View
213
Download
1
Embed Size (px)
Citation preview
1
Attributes GoalExample Fri Hun Pat Price Rain Res Type Est WillWait
X1 No Yes Some $$$ No Yes French 0-10 Yes
X2 No Yes Full $ No No Thai 30-60 No
X3 No No Some $ No No Burger 0-10 Yes
X4 Yes Yes Full $ No No Thai 10-30 Yes
X5 Yes No Full $$$ No Yes French >60 No
X6 No Yes Some $$ Yes Yes Italian 0-10 Yes
X7 No No None $ Yes No Burger 0-10 No
X8 No Yes Some $$ Yes Yes Thai 0-10 Yes
X9 Yes No Full $ Yes No Burger >60 No
X10 Yes Yes Full $$$ No Yes Italian 10-30 No
X11 No No None $ No No Thai 0-10 No
X12 Yes Yes Full $ No No Burger 30-60 Yes
The Restaurant Domain
Will they wait, or not?
2
Decision TreesPatrons?
No Yes WaitEst?
No Alternate? Hungry? Yes
Reservation? Fri/Sat? Alternate?Yes
NoYesBar? Yes
YesNo
Raining?Yes
YesNo
nonesome
full
>60 30-60 10-300-10
no yes no yes
no yes no yes no yes
no yes no yes
3
Inducing Decision Trees
Start at the root with all examples.If there are both positive and negative examples, choose an attribute to split them.
If all remaining examples are positive (or negative), label with Yes (or No).
If no example exists, determine label according to majority in parent.
If no attributes left, but you still have both positive and negative examples, you have a problem...
4
Inducing decision trees
Patrons?
+- X7, X11
nonesome
full
+ X1, X3, X4, X6, X8, X12- X2, X5, X7, X9, X10, X11
+X1, X3, X6, X8-
+X4, X12- X2, X5, X9, X10
Type?
+ X1- X5
FrenchItalian Thai
+X6- X10
+X3, X12- X7, X9
+ X4,X8- X2, X11
Burger
5
Continuing Induction
Patrons?
+- X7, X11
nonesome
full
+ X1, X3, X4, X6, X8, X12- X2, X5, X7, X9, X10, X11
+X1, X3, X6, X8-
+X4, X12- X2, X5, X9, X10
No Yes Hungry?
+ X4, X12- X2, X10
+- X5, X9
6
Final Decision Tree
Patrons?
No Yes Hungry?
Type?
Fri/Sat?
No
Yes
Yes
YesNo
No
nonesome
full
>60 No Yes
French Italian
no yes
Thai burger
7
Decision Trees: summary
Finding optimal decision tree is computationally intractable.
We use heuristics: Choosing the right attribute is the key. Choice based
on information content that the attribute provides.Represent DNF boolean formulas.Work well in practice.What do do with noise? Continuous attributes?
Attributes with large domains?
8
Choosing an Attribute:Disorder vs. Homogeneity
Bad
Good
9
The Value of Information
If you control the mail, you control information
Information theory enables to quantify the discriminating value of an attribute.
It will rain in Seattle tomorrow (Boring)We’ll have an earthquake tomorrow (ok, I’m listening)The value of a piece of information is inversely
proportional to its probability.
- Seinfeld
10
Information Theory
We quantify the value of knowing E as -Lg2 Prob(E) .
If E1,…,En are the possible outcomes of an event, then the value of knowing the outcome is:
Examples: P( 1/2, 1/2) = -1/2 Lg (1/2) - 1/2 Lg 1/2 = 1 P(0.99, 0.01) = 0.08
n
i
iin ELgEEPEPI1
1 )Pr()Pr())(),...,((
11
Why Should We care?
Suppose we have p positive examples, and n negative ones.
If I classify an example for you as positive or negative, then I’m giving you information:
Now let’s calculate the information you would need after I gave you the value of the attribute A.
),(np
n
np
pIInitial
12
The Value of an Attribute
Suppose the attribute can take on n values. For A=vali, there would still be pi positive
examples, and ni neagive examples.
The probability of the A=vali is (pi+ni)/(p+n).Hence, after I tell you the value of A, you
need the following amount of information to classify an example:
),(Remainder1 ii
i
ii
in
i
ii
np
n
np
pI
np
np
13
The value of an Attribute (cont)
The value of an attribute is the difference between the amount of information to classify before and after, I.e., Initial - Remainder.
Patrons:
Remainder(Patrons) =
+- X7, X11
+X1, X3, X6, X8-
+X4, X12- X2, X5, X9, X10
)6
4,
6
2(
12
6)0,1(
12
4)1,0(
12
2 II