Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

Lehrstuhl fuer Maschinelles Lernen und Natuerlich Sprachliche SystemeAlbrecht Zimmernann, Tayfun Guerel, Kristian Kersting, Prof. Dr. Luc De Raedt,

Machine Learning in Games

Crash Course on Machine Learning

Why Machine Learning?

• Past

Computers (mostly) programmed by hand

• Future

Computers (mostly) program themselves, by interaction with their environment

Behavioural Cloning / Verhaltensimitation

User model

Backgammon

• More than 1020 states (boards)• Best human players see only small fraction

of all boards during lifetime• Searching is hard because of dice (branching

factor > 100)

TD-Gammon by Tesauro (1995)

Recent Trends

• Recent progress in algorithms and theory

• Growing flood of online data

• Computational power is available

• Growing industry

Three Niches for Machine Learning

• Data mining: using historical data to improve decisions– Medical records medical knowledge

• Software applications we can’t program by hand– Autonomous driving– Speech recognition

• Self customizing programs– Newsreader that learns user interests

Typical Data Mining task

• Given:– 9,714 patient records, each describing pregnancy and birth– Each patient record contains 215 features

• Learn to predict:– Class of future patients at risk for Emergency Cesarean

Section

Data Mining Result

One of 18 learned rules:If no previous vaginal delivery

abnormal 2nd Trimester UltrasoundMalpresentation at admission

Then Probability of Emergency C-Section is 0.6

Accuracy over training data: 26/14 = .63Accuracy over testing data: 12/20 = .60

Credit Risk Analysis

Learned Rules:If Other-Delinquent-Accounts > 2

Number-Delinquent-Billing-Cycles > 1Then Profitable-Customer? = no

If Other-Delinquent-Accounts = 0(Income > $30k OR Years-of-credit > 3)

Then Profitable-Customer? = yes

Other Prediction Problems

Processoptimization

CustomerPurchasebehavior

Customerretention

Problems Too Difficult to Program by Hand

• ALVINN [Pomerlau] drives 70mph on highways

Problems Too Difficult to Program by Hand

• ALVINN [Pomerlau] drives 70mph on highways

Software that Customizes to User

Lehrstuhl fuer Maschinelles Lernen und Natuerlich Sprachliche SystemeAlbrecht Zimmernann, Tayfun Guerel, Kristian Kersting, Prof. Dr. Luc De Raedt,

Machine Learning in Games

Crash Course on Decision Tree Learning

Refund

TaxInc

Yes No

Married Single, Divorced

< 80K > 80K

Classification: Definition

• Given a collection of records (training set )– Each record contains a set of attributes, one of the

attributes is the class.

• Find a model for class attribute as a function of the values of other attributes.

• Goal: previously unseen records should be assigned a class as accurately as possible.– A test set is used to determine the accuracy of the

model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

Illustrating Classification Task

Induction

Deduction

Tid Attrib1 Attrib2 Attrib3 Class

1 Yes Large 125K No

2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No

8 No Small 85K Yes

9 No Medium 75K No

10 No Small 90K Yes

Tid Attrib1 Attrib2 Attrib3 Class

11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?

14 No Small 95K ?

15 No Large 67K ?

Test Set

Learningalgorithm

Training Set

Examples of Classification Task

• Predicting tumor cells as benign or malignant

• Classifying credit card transactions as legitimate or fraudulent

• Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil

• Categorizing news stories as finance, weather, entertainment, sports, etc

Classification Techniques

• Decision Tree based Methods• Rule-based Methods• Instance-Based Learners• Neural Networks• Bayesian Networks• (Conditional) Random Fields• Support Vector Machines• Inductive Logic Programming• Statistical Relational Learning• …

Decision Tree for PlayTennis

Outlook

Sunny Overcast Rain

Humidity

High Normal

Strong Weak

No Yes

Outlook

Sunny Overcast Rain

Humidity

High Normal

No Yes

Each internal node tests an attribute

Each branch corresponds to anattribute value node

Each leaf node assigns a classification

Outlook

Sunny Overcast Rain

Humidity

High Normal

Strong Weak

No Yes

Outlook Temperature Humidity Wind PlayTennis Sunny Hot High Weak ?

Decision Tree for Conjunction

Outlook

Sunny Overcast Rain

Strong Weak

No Yes

Outlook=Sunny Wind=Weak

Decision Tree for Disjunction

Outlook

Sunny Overcast Rain

Outlook=Sunny Wind=Weak

Strong Weak

No Yes

Strong Weak

No Yes

Decision Tree for XOR

Outlook

Sunny Overcast Rain

Strong Weak

Yes No

Outlook=Sunny XOR Wind=Weak

Strong Weak

No Yes

Strong Weak

No Yes

Decision Tree

Outlook

Sunny Overcast Rain

Humidity

High Normal

Strong Weak

No Yes

• decision trees represent disjunctions of conjunctions

(Outlook=Sunny Humidity=Normal) (Outlook=Overcast) (Outlook=Rain Wind=Weak)

When to consider Decision Trees

• Instances describable by attribute-value pairs

• Target function is discrete valued

• Disjunctive hypothesis may be required

• Possibly noisy training data

• Missing attribute values

• Examples:– Medical diagnosis– Credit risk analysis– RTS Games ?

Decision Tree Induction

• Many Algorithms:– Hunt’s Algorithm (one of the earliest)– CART– ID3, C4.5– …

Top-Down Induction of Decision Trees ID3

1. A the “best” decision attribute for next

2. Assign A as decision attribute for node

3. For each value of A create new

descendant

4. Sort training examples to leaf node

according to

the attribute value of the branch

5. If all training examples are perfectly

classified (same value of target attribute)

stop, else iterate over new leaf nodes.

Which Attribute is ”best”?

True False

[21+, 5-] [8+, 30-]

[29+,35-] A2=?

True False

[18+, 33-] [11+, 2-]

[29+,35-]

• Example:– 2 Attributes, 1 class variable– 64 examples: 29+, 35-

Entropy

• S is a sample of training examples

• p+ is the proportion of positive examples

• p- is the proportion of negative examples

• Entropy measures the impurity of S

Entropy(S) = -p+ log2 p+ - p- log2 p-

Entropy

• Entropy(S) = expected number of bits needed to encode class (+ or -) of randomly drawn members of S (under the optimal, shortest length-code)

• Information theory: optimal length code assigns

–log2 p bits to messages having probability p.

• So, the expected number of bits to encode (+ or -) of random member of S:

-p+ log2 p+ - p- log2 p-

(log 0 = 0)

Information Gain

• Gain(S,A): expected reduction in entropy due to sorting S on attribute A

True False

[21+, 5-] [8+, 30-]

[29+,35-] A2=?

True False

[18+, 33-] [11+, 2-]

[29+,35-]

Gain(S,A)=Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv)

Entropy([29+,35-]) = -29/64 log2 29/64 – 35/64 log2 35/64 = 0.99

Information Gain

True False

[21+, 5-] [8+, 30-]

[29+,35-]

Entropy([21+,5-]) = 0.71Entropy([8+,30-]) = 0.74Gain(S,A1)=Entropy(S) -26/64*Entropy([21+,5-])

-38/64*Entropy([8+,30-])

Entropy([18+,33-]) = 0.94Entropy([11+,2-]) = 0.62Gain(S,A2)=Entropy(S) -51/64*Entropy([18+,33-])

-13/64*Entropy([11+,2-])

=0.12 A2=?

True False

[18+, 33-] [11+, 2-]

[29+,35-]

Entropy(S)=Entropy([29+,35-]) = 0.99

Another Example

• 14 training-example (9+, 5-) days for playing tennis– Wind: weak, strong– Humidity: high, normal

Another Example

Humidity

High Normal

[3+, 4-] [6+, 1-]

S=[9+,5-]E=0.940

Gain(S,Humidity)=0.940-(7/14)*0.985 – (7/14)*0.592=0.151

E=0.985 E=0.592

Weak Strong

[6+, 2-] [3+, 3-]

S=[9+,5-]E=0.940

E=0.811 E=1.0

Gain(S,Wind)=0.940-(8/14)*0.811 – (6/14)*1.0=0.048

Yet Another Example: Playing Tennis

Day Outlook Temp. Humidity Wind Play TennisD1 Sunny Hot High Weak NoD2 Sunny Hot High Strong NoD3 Overcast Hot High Weak YesD4 Rain Mild High Weak YesD5 Rain Cool Normal Weak YesD6 Rain Cool Normal Strong NoD7 Overcast Cool Normal Weak YesD8 Sunny Mild High Weak NoD9 Sunny Cold Normal Weak YesD10 Rain Mild Normal Strong YesD11 Sunny Mild Normal Strong YesD12 Overcast Mild High Strong YesD13 Overcast Hot Normal Weak YesD14 Rain Mild High Strong No

PlayTennis - Selecting Next Attribute

Outlook

Sunny Rain

[2+, 3-] [3+, 2-]

S=[9+,5-]E=0.940

Gain(S,Outlook)=0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.0971=0.247

E=0.971

Overcast

[4+, 0]

Gain(S,Humidity) = 0.151Gain(S,Wind) = 0.048Gain(S,Temp) = 0.029

PlayTennis - ID3 Algorithm

Outlook

Sunny Overcast Rain

[D1,D2,…,D14] [9+,5-]

Ssunny=[D1,D2,D8,D9,D11] [2+,3-]

[D3,D7,D12,D13] [4+,0-]

[D4,D5,D6,D10,D14] [3+,2-]

Gain(Ssunny , Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970Gain(Ssunny , Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570Gain(Ssunny , Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019

ID3 Algorithm

Outlook

Sunny Overcast Rain

Humidity

High Normal

Strong Weak

No Yes

[D3,D7,D12,D13]

[D8,D9,D11] [D6,D14][D1,D2] [D4,D5,D10]

Hypothesis Space Search ID3

- - ++ - +

Hypothesis Space Search ID3

• Hypothesis space is complete!– Target function surely in there…

• Outputs a single hypothesis • No backtracking on selected attributes (greedy

search)– Local minimal (suboptimal splits)

• Statistically-based search choices– Robust to noisy data

• Inductive bias (search bias)– Prefer shorter trees over longer ones– Place high information gain attributes close to the

Converting a Tree to Rules

Outlook

Sunny Overcast Rain

Humidity

High Normal

Strong Weak

No Yes

R1: If (Outlook=Sunny) (Humidity=High) Then PlayTennis=No R2: If (Outlook=Sunny) (Humidity=Normal) Then PlayTennis=YesR3: If (Outlook=Overcast) Then PlayTennis=Yes R4: If (Outlook=Rain) (Wind=Strong) Then PlayTennis=NoR5: If (Outlook=Rain) (Wind=Weak) Then PlayTennis=Yes

Conclusions

1. Decision tree learning provides a practical method for concept learning.

2. ID3-like algorithms search complete hypothesis space.

3. The inductive bias of decision trees is preference (search) bias.

4. Overfitting (you will see it, ;-)) the training data is an important issue in decision tree learning.

5. A large number of extensions of the ID3 algorithm have been proposed for overfitting avoidance, handling missing attributes, handling numerical attributes, etc (feel free to try them out).

Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles

Documents

SOSE Chapter 8

IEEE SoSE 2014 - sosengineering.org

Lehre Kick-off Master SoSe 2013

Feedback Entertainment - Sose to Party

Kick-Off Bachelor SoSe 2013

Maschinelles Lernen in der Praxis Stuttgart ... - cio-bw.de

Erdungsboxen Fuer Crossbonding Und Schirmtrennung_DE

Auslaenderrecht Fuer Studierende Web

Determining Return on Investment for MBSE, SE & SoSE · Determining Return on Investment for MBSE, SE & SoSE ASEW 2015 –Determining RoI for MBSE, SE and SoSE 1 ... The Return on

Broschuere fuer e-mail-versand

Assignment 2 SOSE

Membrane Bioinformatics SoSe 2009 Helms/Böckmann

Fit fuer den test Daf.pdf

Natural Language Processing SoSe 2014 · PDF fileNatural Language Processing SoSe 2014 Question Answering Dr. Mariana Neves June 25th, 2014

Praxisleitfaden Fuer Sap Co

Services, SOA, SOSE and the Future

Regelungstechnik Fuer Ingenieure - Vieweg

Facebook Marketing fuer Einsteiger

Sam + Cades SOSE Presentation

Membrane Bioinformatics SoSe 2009 Böckmann & Helms