MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer

MACHINE MACHINE LEARNINGLEARNING

What is learning?What is learning?

A computer program learns if it A computer program learns if it improves its performance at some improves its performance at some task through experience (T. task through experience (T. Mitchell, 1997)Mitchell, 1997)

Any change in a system that allows Any change in a system that allows it to perform better (Simon 1983)it to perform better (Simon 1983)

What do we learn:What do we learn:DescriptionsRules how to

recognize/classify objects, states, events

Rules how to transform an initial situation to achieve a goal (final state)

How do we learn:How do we learn: Rote learning - storage of computed information. Taking advice from others. (Advice may need to be

operationalized.) Learning from problem solving experiences -

remembering experiences and generalizing from them. (May add efficiency but not new knowledge.)

Learning from examples. (May or may not involve a teacher.)

Learning by experimentation and discovery. (Decreasing burden on teacher, increasing burden on learner.)

Approaches to Machine Approaches to Machine LearningLearning

• Symbol-based• Connectionist Learning• Evolutionary learning

Inductive Symbol-Based Inductive Symbol-Based Machine LearningMachine Learning

Concept LearningConcept Learning

Version space searchVersion space search Decision trees: ID3 algorithmDecision trees: ID3 algorithm Explanation-based learningExplanation-based learning Supervised learningSupervised learning Reinforcement learningReinforcement learning

Version space search for Version space search for concept learningconcept learning

Concepts – describe classes of Concepts – describe classes of objectsobjects

Concepts consist of feature setsConcepts consist of feature sets Operation on concept Operation on concept

descriptionsdescriptions Generalization:Generalization: Replace a feature with Replace a feature with

a variablea variable Specialization:Specialization: Instantiate a variable Instantiate a variable

with a featurewith a feature

Positive and Negative Positive and Negative examples of a conceptexamples of a concept

The concept description has to The concept description has to match all positive examplesmatch all positive examples

The concept description has to The concept description has to be false for the negative be false for the negative examplesexamples

Plausible descriptionsPlausible descriptions

The version space represents all the alternative plausible descriptions of the concept

A plausible description is one that is applicable to all known positive examples and no known negative example.

Algorithm: Candidate Algorithm: Candidate eliminationelimination

Given: A representation language

A set of positive and negative

examples expressed in that language Compute: A concept description that is

consistent with all the positive examples and none of the negative examples

HypothesesHypotheses

The version space contains two sets of hypotheses:

G – the most general hypotheses that match the training data

S – the most specific hypotheses that match the training data

Each hypothesis is represented as a vector of values of the known attributes

Example of Version spaceExample of Version space

Consider the task to obtain a description of the concept: Japanese Economy car.

The attributes under consideration are:

Origin, Manufacturer, Color, Decade, Type

training data:

Positive ex: (Japan, Honda, Blue, 1980, Economy)

Positive ex: (Japan, Honda, White, 1980, Economy)

Negative ex: (Japan, Toyota, Green, 1970, Sports)

Example continuedExample continued

The most general hypothesis that match the data is:

(?, Honda, ?, ?, Economy) the symbol ‘?’ means that the attribute may take any value

The most specific hypothesis that match the examples is:

(Japan, Honda, ?,?, Economy)

Algorithm: Candidate Algorithm: Candidate eliminationelimination

Initialize G to contain one element: the null description (all features are variables).

Initialize S to contain one element: the first positive example.

Accept a new training example.

Matching positive examplesMatching positive examples

Remove from G any descriptions that do not cover the example.

Update the S set to contain the most specific set of descriptions in the version space that cover the example and the current elements of the S set

(i.e., generalize the elements of S as little as possible so that they cover the new training example)

Matching negative examplesMatching negative examples

Remove from S any descriptions that cover the negative example.

Update the G set to contain the most general set of descriptions in the version space that do not cover the example

(i.e., specialize the elements of G as little as possible so that the negative example is no longer covered by any of the elements of G).

Comparing G and SComparing G and S

If S and G are both singleton sets, then: if they are identical, output their value

and halt. if they are different, the training cases

were inconsistent. Output this result and halt.

Else continue accepting new training examples

Learning the concept of Learning the concept of "Japanese economy car""Japanese economy car"

Features: Origin, Manufacturer, Color, Decade, Type

POSITIVE EXAMPLE: (Japan, Honda, Blue, 1980, Economy)

Initialize G to singleton set that includes everything

Initialize S to singleton set that includes first positive example G = {(?, ?, ?, ?, ?)}

S = {(Japan, Honda, Blue, 1980, Economy)}


NEGATIVE EXAMPLE: (Japan, Toyota, Green, 1970, Sports)

Specialize G to exclude negative example

G = {(?, Honda, ?, ?, ?), (?, ?, Blue, ?, ?) (?, ?, ?, 1980, ?) (?, ?, ?, ?, Economy)} S = {(Japan, Honda, Blue, 1980, Economy)}


POSITIVE EXAMPLE: (Japan, Toyota, Blue, 1990, Economy)

Remove from G descriptions inconsistent with positive example

Generalize S to include positive example G = { (?, ?, Blue, ?, ?)

(?, ?, ?, ?, Economy)} S = {(Japan, ?, Blue, ?, Economy)}


NEGATIVE EXAMPLE: (USA, Chrysler, Red, 1980, Economy)

Specialize G to exclude negative example (but staying within version space, i.e., staying consistent with S)

G = {(?, ?, Blue, ?, ?) (Japan, ?, ?, ?, Economy)} S = {(Japan, ?, Blue, ?, Economy)}


POSITIVE EXAMPLE: (Japan, Honda, White, 1980, Economy)

Remove from G descriptions inconsistent with positive example

Generalize S to include the positive example G = {(Japan, ?, ?, ?, Economy)}

S = {(Japan, ?, ?, ?, Economy)} S = G, both singleton => done!

Decision treesDecision trees

A decision tree is a structure that represents a procedure for classifying objects based on their attributes.

Each object is represented as a set of attribute/value pairs and a classification.

ExampleExample

A set of medical symptoms might be represented as follows:

Cough Fever Weight Pain Classification Mary no yes normal throat flu Fred no yes normal abdomen appendicitis Julie yes yes skinny none flu Elvis yes no obese chest heart disease

The system is given a set of training instances along with their correct classifications and develops a decision tree based on these examples.

Choosing Good Choosing Good AttributesAttributes

If a crucial attribute is not represented, then no decision tree will be able to learn the concept.

If two training instances have the same representation but belong to different classes, then the attribute set is said to be inadequate. It is impossible for the decision tree to distinguish the instances.

Learning of Decision Learning of Decision TreesTrees

Algorithm: The ID3 learning algorithm (Quinlan, 1986)

If all examples from E belong to the same class Cj then label the leaf with Cj else

select the “best” decision attribute A with values

v1, v2, …, vn for next node divide the training set S into S1, …, Sn according

to values v1,…,vn recursively build subtrees T1, …, Tn for S1, …, Sn

generate decision tree T Which attribute is best?

EntropyEntropy SS - a sample of training examples; p+ (p-) is a proportion of positive (negative) examples in

S

Entropy(S) = expected number of bits needed to encode the classification of an arbitrary member of S

Information theory: optimal length code assigns-log2 p bits to message having probability p

Expected number of bits to encode “+” or “-” of random member of S:

Entropy(S) - p- log2 p- - p+ log2 p+

Generally for c different classesEntropy(S)

c- pi log2 pi

Information Gain Search Information Gain Search HeuristicHeuristic

Gain(S,A) - the expected reduction in entropy caused by partitioning the examples of S according to the attribute A. a measure of the effectiveness of an attribute in

classifying the training data

Values(A) - possible values of the attribute A Sv - subset of S, for which attribute A has value v

The best attribute has maximal Gain(S,A) Aim is to minimise the number of tests needed for class.

Gain S A Entropy SSv

Sv Values A

( , ) = ( ) -( )

Entropy Sv( )

Examples of Training Examples

SourcesSources

Ashwin Ram, 1990-93 Assistant Professor, College of Computing Georgia Institute of Technology, Atlanta

http://www.cc.gatech.edu/classes/cs3361_97_winter/learning.txt

J. Kubalik. Machine Learning I – Outline. Gerstner Laboratory for Intelligent Decision Making and Control

Documents

MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer