44
oftware-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles Lernen und Natuerlich Sprachliche Systeme Albrecht Zimmernann, Tayfun Guerel, Kristian Kersting, Prof. Dr. Luc De Raedt, Machine Learning in Games Crash Course on Machine Learning

Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

  • Upload
    butest

  • View
    509

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Lehrstuhl fuer Maschinelles Lernen und Natuerlich Sprachliche SystemeAlbrecht Zimmernann, Tayfun Guerel, Kristian Kersting, Prof. Dr. Luc De Raedt,

Machine Learning in Games

Crash Course on Machine Learning

Page 2: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Why Machine Learning?

• Past

Computers (mostly) programmed by hand

• Future

Computers (mostly) program themselves, by interaction with their environment

Page 3: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Behavioural Cloning / Verhaltensimitation

plays

logs

plays

User model

Page 4: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Backgammon

• More than 1020 states (boards)• Best human players see only small fraction

of all boards during lifetime• Searching is hard because of dice (branching

factor > 100)

Page 5: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

TD-Gammon by Tesauro (1995)

Page 6: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Recent Trends

• Recent progress in algorithms and theory

• Growing flood of online data

• Computational power is available

• Growing industry

Page 7: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Three Niches for Machine Learning

• Data mining: using historical data to improve decisions– Medical records medical knowledge

• Software applications we can’t program by hand– Autonomous driving– Speech recognition

• Self customizing programs– Newsreader that learns user interests

Page 8: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Typical Data Mining task

• Given:– 9,714 patient records, each describing pregnancy and birth– Each patient record contains 215 features

• Learn to predict:– Class of future patients at risk for Emergency Cesarean

Section

Page 9: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Data Mining Result

One of 18 learned rules:If no previous vaginal delivery

abnormal 2nd Trimester UltrasoundMalpresentation at admission

Then Probability of Emergency C-Section is 0.6

Accuracy over training data: 26/14 = .63Accuracy over testing data: 12/20 = .60

Page 10: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Credit Risk Analysis

Learned Rules:If Other-Delinquent-Accounts > 2

Number-Delinquent-Billing-Cycles > 1Then Profitable-Customer? = no

If Other-Delinquent-Accounts = 0(Income > $30k OR Years-of-credit > 3)

Then Profitable-Customer? = yes

Page 11: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Other Prediction Problems

Processoptimization

CustomerPurchasebehavior

Customerretention

Page 12: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Problems Too Difficult to Program by Hand

• ALVINN [Pomerlau] drives 70mph on highways

Page 13: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Problems Too Difficult to Program by Hand

• ALVINN [Pomerlau] drives 70mph on highways

Page 14: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Software that Customizes to User

Page 15: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Lehrstuhl fuer Maschinelles Lernen und Natuerlich Sprachliche SystemeAlbrecht Zimmernann, Tayfun Guerel, Kristian Kersting, Prof. Dr. Luc De Raedt,

Machine Learning in Games

Crash Course on Decision Tree Learning

Refund

MarSt

TaxInc

YESNO

NO

NO

Yes No

Married Single, Divorced

< 80K > 80K

Page 16: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Classification: Definition

• Given a collection of records (training set )– Each record contains a set of attributes, one of the

attributes is the class.

• Find a model for class attribute as a function of the values of other attributes.

• Goal: previously unseen records should be assigned a class as accurately as possible.– A test set is used to determine the accuracy of the

model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

Page 17: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Illustrating Classification Task

Apply

Model

Induction

Deduction

Learn

Model

Model

Tid Attrib1 Attrib2 Attrib3 Class

1 Yes Large 125K No

2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No

8 No Small 85K Yes

9 No Medium 75K No

10 No Small 90K Yes

Tid Attrib1 Attrib2 Attrib3 Class

11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?

14 No Small 95K ?

15 No Large 67K ?

Test Set

Learningalgorithm

Training Set

Page 18: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Examples of Classification Task

• Predicting tumor cells as benign or malignant

• Classifying credit card transactions as legitimate or fraudulent

• Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil

• Categorizing news stories as finance, weather, entertainment, sports, etc

Page 19: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Classification Techniques

• Decision Tree based Methods• Rule-based Methods• Instance-Based Learners• Neural Networks• Bayesian Networks• (Conditional) Random Fields• Support Vector Machines• Inductive Logic Programming• Statistical Relational Learning• …

Page 20: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Decision Tree for PlayTennis

Outlook

Sunny Overcast Rain

Humidity

High Normal

Wind

Strong Weak

No Yes

Yes

YesNo

Page 21: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Decision Tree for PlayTennis

Outlook

Sunny Overcast Rain

Humidity

High Normal

No Yes

Each internal node tests an attribute

Each branch corresponds to anattribute value node

Each leaf node assigns a classification

Page 22: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

No

Decision Tree for PlayTennis

Outlook

Sunny Overcast Rain

Humidity

High Normal

Wind

Strong Weak

No Yes

Yes

YesNo

Outlook Temperature Humidity Wind PlayTennis Sunny Hot High Weak ?

Page 23: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Decision Tree for Conjunction

Outlook

Sunny Overcast Rain

Wind

Strong Weak

No Yes

No

Outlook=Sunny Wind=Weak

No

Page 24: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Decision Tree for Disjunction

Outlook

Sunny Overcast Rain

Yes

Outlook=Sunny Wind=Weak

Wind

Strong Weak

No Yes

Wind

Strong Weak

No Yes

Page 25: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Decision Tree for XOR

Outlook

Sunny Overcast Rain

Wind

Strong Weak

Yes No

Outlook=Sunny XOR Wind=Weak

Wind

Strong Weak

No Yes

Wind

Strong Weak

No Yes

Page 26: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Decision Tree

Outlook

Sunny Overcast Rain

Humidity

High Normal

Wind

Strong Weak

No Yes

Yes

YesNo

• decision trees represent disjunctions of conjunctions

(Outlook=Sunny Humidity=Normal) (Outlook=Overcast) (Outlook=Rain Wind=Weak)

Page 27: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

When to consider Decision Trees

• Instances describable by attribute-value pairs

• Target function is discrete valued

• Disjunctive hypothesis may be required

• Possibly noisy training data

• Missing attribute values

• Examples:– Medical diagnosis– Credit risk analysis– RTS Games ?

Page 28: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Decision Tree Induction

• Many Algorithms:– Hunt’s Algorithm (one of the earliest)– CART– ID3, C4.5– …

Page 29: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Top-Down Induction of Decision Trees ID3

1. A the “best” decision attribute for next

node

2. Assign A as decision attribute for node

3. For each value of A create new

descendant

4. Sort training examples to leaf node

according to

the attribute value of the branch

5. If all training examples are perfectly

classified (same value of target attribute)

stop, else iterate over new leaf nodes.

Page 30: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Which Attribute is ”best”?

A1=?

True False

[21+, 5-] [8+, 30-]

[29+,35-] A2=?

True False

[18+, 33-] [11+, 2-]

[29+,35-]

• Example:– 2 Attributes, 1 class variable– 64 examples: 29+, 35-

Page 31: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Entropy

• S is a sample of training examples

• p+ is the proportion of positive examples

• p- is the proportion of negative examples

• Entropy measures the impurity of S

Entropy(S) = -p+ log2 p+ - p- log2 p-

Page 32: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Entropy

• Entropy(S) = expected number of bits needed to encode class (+ or -) of randomly drawn members of S (under the optimal, shortest length-code)

• Information theory: optimal length code assigns

–log2 p bits to messages having probability p.

• So, the expected number of bits to encode (+ or -) of random member of S:

-p+ log2 p+ - p- log2 p-

(log 0 = 0)

Page 33: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Information Gain

• Gain(S,A): expected reduction in entropy due to sorting S on attribute A

A1=?

True False

[21+, 5-] [8+, 30-]

[29+,35-] A2=?

True False

[18+, 33-] [11+, 2-]

[29+,35-]

Gain(S,A)=Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv)

Entropy([29+,35-]) = -29/64 log2 29/64 – 35/64 log2 35/64 = 0.99

Page 34: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Information Gain

A1=?

True False

[21+, 5-] [8+, 30-]

[29+,35-]

Entropy([21+,5-]) = 0.71Entropy([8+,30-]) = 0.74Gain(S,A1)=Entropy(S) -26/64*Entropy([21+,5-])

-38/64*Entropy([8+,30-])

=0.27

Entropy([18+,33-]) = 0.94Entropy([11+,2-]) = 0.62Gain(S,A2)=Entropy(S) -51/64*Entropy([18+,33-])

-13/64*Entropy([11+,2-])

=0.12 A2=?

True False

[18+, 33-] [11+, 2-]

[29+,35-]

Entropy(S)=Entropy([29+,35-]) = 0.99

Page 35: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Another Example

• 14 training-example (9+, 5-) days for playing tennis– Wind: weak, strong– Humidity: high, normal

Page 36: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Another Example

Humidity

High Normal

[3+, 4-] [6+, 1-]

S=[9+,5-]E=0.940

Gain(S,Humidity)=0.940-(7/14)*0.985 – (7/14)*0.592=0.151

E=0.985 E=0.592

Wind

Weak Strong

[6+, 2-] [3+, 3-]

S=[9+,5-]E=0.940

E=0.811 E=1.0

Gain(S,Wind)=0.940-(8/14)*0.811 – (6/14)*1.0=0.048

Page 37: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Yet Another Example: Playing Tennis

Day Outlook Temp. Humidity Wind Play TennisD1 Sunny Hot High Weak NoD2 Sunny Hot High Strong NoD3 Overcast Hot High Weak YesD4 Rain Mild High Weak YesD5 Rain Cool Normal Weak YesD6 Rain Cool Normal Strong NoD7 Overcast Cool Normal Weak YesD8 Sunny Mild High Weak NoD9 Sunny Cold Normal Weak YesD10 Rain Mild Normal Strong YesD11 Sunny Mild Normal Strong YesD12 Overcast Mild High Strong YesD13 Overcast Hot Normal Weak YesD14 Rain Mild High Strong No

Page 38: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

PlayTennis - Selecting Next Attribute

Outlook

Sunny Rain

[2+, 3-] [3+, 2-]

S=[9+,5-]E=0.940

Gain(S,Outlook)=0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.0971=0.247

E=0.971

E=0.971

Overcast

[4+, 0]

E=0.0

Gain(S,Humidity) = 0.151Gain(S,Wind) = 0.048Gain(S,Temp) = 0.029

Page 39: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

PlayTennis - ID3 Algorithm

Outlook

Sunny Overcast Rain

Yes

[D1,D2,…,D14] [9+,5-]

Ssunny=[D1,D2,D8,D9,D11] [2+,3-]

? ?

[D3,D7,D12,D13] [4+,0-]

[D4,D5,D6,D10,D14] [3+,2-]

Gain(Ssunny , Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970Gain(Ssunny , Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570Gain(Ssunny , Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019

Page 40: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

ID3 Algorithm

Outlook

Sunny Overcast Rain

Humidity

High Normal

Wind

Strong Weak

No Yes

Yes

YesNo

[D3,D7,D12,D13]

[D8,D9,D11] [D6,D14][D1,D2] [D4,D5,D10]

Page 41: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Hypothesis Space Search ID3

+ - +

+ - +

A1

- - ++ - +

A2

+ - -

+ - +

A2

-

A4+ -

A2

-

A3- +

Page 42: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Hypothesis Space Search ID3

• Hypothesis space is complete!– Target function surely in there…

• Outputs a single hypothesis • No backtracking on selected attributes (greedy

search)– Local minimal (suboptimal splits)

• Statistically-based search choices– Robust to noisy data

• Inductive bias (search bias)– Prefer shorter trees over longer ones– Place high information gain attributes close to the

root

Page 43: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Converting a Tree to Rules

Outlook

Sunny Overcast Rain

Humidity

High Normal

Wind

Strong Weak

No Yes

Yes

YesNo

R1: If (Outlook=Sunny) (Humidity=High) Then PlayTennis=No R2: If (Outlook=Sunny) (Humidity=Normal) Then PlayTennis=YesR3: If (Outlook=Overcast) Then PlayTennis=Yes R4: If (Outlook=Rain) (Wind=Strong) Then PlayTennis=NoR5: If (Outlook=Rain) (Wind=Weak) Then PlayTennis=Yes

Page 44: Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

So

ftwa

re-P

rakt

iku

m S

oS

e 2

005

Conclusions

1. Decision tree learning provides a practical method for concept learning.

2. ID3-like algorithms search complete hypothesis space.

3. The inductive bias of decision trees is preference (search) bias.

4. Overfitting (you will see it, ;-)) the training data is an important issue in decision tree learning.

5. A large number of extensions of the ID3 algorithm have been proposed for overfitting avoidance, handling missing attributes, handling numerical attributes, etc (feel free to try them out).