Machine Learning Introduction Machine learningbejar/ia/transpas/teoria.mti/6-AP-aprendizaj… · Machine Learning Introduction Types of machine learning Inductive Learning: Models

Machine Learning Introduction

Machine learning

All the techniques that we have seen until now allow us to buildintelligent systems

The limitation of these systems is that they only can solve theproblems their are programmed for

But we only should consider a system intelligent if is also able toobserve its environment and learn from it

The real intelligence resides in adaptation, to be able to integrate newknowledge, to solve new problems, to learn from mistakes

BY:© $\© C© (LSI-FIB-UPC) Artificial Intelligence Term 2009/2010 1 / 28


Goals

The goal is not to model human learning

The goal is to overcome the limitations of usual AI applications(KBS, planning, NLP, problem solving, ...):

Their limit is in the knowledge that they haveTheir capacities can not reach outside that limits

It is not possible to foresee all possible problems from the beginning

We are looking for programs that can adapt without beingreprogrammed



Does it works

Does Machine Learning Really Work? Tom Mitchell. AI Magazine 1997

Where and what can machine learning be applied for?

Tasks very difficult to program (face recognition, voice, ...)

Adaptable applications (intelligent interfaces, spam filters,recommendation systems, ...)

Data mining (intelligent data analysis)



Types of machine learning

Inductive Learning: Models are built from the generalization ofexamples. We look for patterns that explain the commoncharacteristics of the examples.

Deductive Learning: Deduction is applied to obtain generalizationsfrom a solved example and its explanation.

Genetic learning: Algorithms inspired in the theory of evolution areapplied to find general description to groups of examples.

Connexionist learning: Generalization is performed by theadaptation mechanisms of artificial neural networks.


Machine Learning Inductive Learning

Inductive learning

Is the area with the most number of methods

Goal: To discover general rules or concepts from a limited set ofexamples (common patterns)

It is based on the search of similar characteristics among examples

All its methods are based on inductive reasoning



Inductive reasoning vs Deductive reasoning

Inductive reasoning

It obtains general knowledge from specific informationThe knowledge obtained is newIts not truth preserving (new information can invalidate the knowledgeobtained)It has not well founded theory

Deductive reasoning

It obtains general knowledge from general knowledgeThe knowledge is not new (it is implicit in the initial knowledge)New knowledge can not invalidate the knowledge already obtainedIts basis is mathematical logic



Inductive learning

From a formal point of view its results are invalid

We suppose that a limited number of examples represent thecharacteristics of the concept that we want to learn

Just only one counterexample invalidates the results

But, most of the human learning is inductive!


Machine Learning Search and inductive learning

Learning as search (I)

The usual way to view inductive learning is as a search problem

The goal is to discover a function/representation that summarizes thecharacteristics of a set of examples

The space of search is all the possible concepts that can be built

There are different ways to perform the search



Learning as search (II)

Space of search: Language used to describe the concepts =⇒ Set ofconcepts that can be described by the language

Search operators: Heuristic operators that allow to explore the spaceof concepts

Heuristic function: Preference function that guides the search (Bias)



Types of inductive learning

Supervised inductive learning

Each example is labeled with the concept it belongs toLearning is performed by contrast among conceptsA set of heuristics allows to generate different hypothesisThere is a criteria of preference (bias) that allows to choose the mostsuitable hypothesis for the examplesResult: The concept or concepts that describe better the examples



Types of inductive learning

Unsupervised inductive learning

Examples are not labeledWe want to discover a suitable way to cluster the objectsLearning is based on the discovery of similarity/dissimilarity amongexamplesA heuristic preference criteria will guide the searchResult: A partition of the examples and a characterization of thepartitions


Machine Learning Decision Trees

Decision Trees

We can learn a concept as the set of questions that allows todistinguish it from others

Using a tree as representation formalism we can store and organizethese questions

Each node from the tree is a question about an attribute

The search space is the set of all possible trees of questions

This representation is equivalent to a DNF (22n)



Decision Trees

To reduce the computational cost of searching this space we must tochoose a bias (what kind of concepts are preferred)

Decision: Tree that gives the minimal description of the goal conceptgiven a set of examples

Reason: Such kind of tree will be the better to predict new instances(the probability that unnecessary conditions appear is reduced)

Occam’s razor: the hypothesis that introduces the fewestassumptions and postulates the fewest entities is to be preferred



Algorithms for Decision Trees

One of the first algorithms for building decision trees is ID3 (Quinlan1986)

It is in the family of algorithms for Top Down Induction DecisionTrees (TDIDT)

ID3 performs a search using a Hill-Climbing strategy in the space ofdecision trees

For each level of the tree an attribute is chosen and the set ofexamples is split using the values of the attribute. This process isrepeated recursively for each partition

The selection of the attribute is performed using an heuristic function



Information Theory

Information theory studies among other things the coding of messagesand the cost of their transmition

If we define a set of messages M = {m1,m2, ...,mn}, each one withprobability P(mi ), we can define the quantity of information (I ) thata message M contains as:

I (M) =n∑

i=1

−P(mi )log(P(mi ))

This value can be interpreted as the information needed todiscriminate the messages from M (Number of bit necessary to codethe messages)



Quantity of information as heuristic

We can use an analogy from message coding assuming that theclasses are messages and the proportion of examples from each classis their probability

A decision tree can be seen as the coding that allows to discriminateamong classes

We are looking for the minimal code that discriminates among classes

Each attribute is evaluated to decide if it is a part of the code

An attribute is better than other if allows to discriminate betteramong classes



Quantity of information as heuristic

At each level of the tree we have to find the attribute that allows tominimize the code (minimizes the size of the tree)

The attribute that allows that is the attribute that left less quantityof information to cover by other attributes

The election of an attribute should result in subsets of examples thatare biased towards one class

We need a measure of the quantity of information not covered by anattribute (Entropy, E )



Information Gain

Quantity of information (X - examples, C - classification)

I (X , C) =∑∀ci∈C

− ]ci]X

log(]ci]X

)

Entropy (A - attribute, [A(x) = vi ] - examples with value vi )

E (X ,A, C) =∑∀vi∈A

][A(x) = vi ]

]XI ([A(x) = vi ], C)

Information Gain

G (X ,A, C) = I (X , C)− E (X ,A, C)



Information Gain

C1 C2 C3

C1 C2 C3 C1 C2 C3C1 C2 C3

A=v3A=v1A=v2

I(X,C)

E(X,A,C)

G(X,A,C)= I(X,C) − E(X,A,C)



ID3 Algorithm

Algorithm: ID3 (X : Examples, C: Classification, A: Attributes)

if all examples are from the same classthen

return a leave with the class nameelse

Compute the quantity of information of the examples (I)foreach attribute in A do

Compute the entropy (E) and the information gain (G)

Pick the attribute that maximizes G (a)Delete a from the list of attributes (A)Generate a root node for the attribute aforeach partition generated by the values of the attribute a do

Treei=ID3(X (a=vi ), C(a=vi ),A-a)generate a new branch with a=vi and Treei

return the root node for a



Example (1)

Let it be the following set of examples

Ex. Eyes Hair Height Class

1 Blue Blonde Tall +2 Blue Dark Medium +3 Brown Dark Medium −4 Green Dark Medium −5 Green Dark Tall +6 Brown Dark Small −7 Green Blonde Small −8 Blue Dark Medium +



Example (2)

I (X ,C ) = −1/2 · log(1/2)− 1/2 · log(1/2) = 1

E (X , eyes) = (blue) 3/8 · (−1 · log(1)− 0 · log(0))

+ (brown) 2/8 · (−1 · log(1)− 0 · log(0))

+ (green) 3/8 · (−1/3 · log(1/3)− 2/3 · log(2/3))

= 0,344

E (X , hair) = (blonde) 2/8 · (−1/2 · log(1/2)− 1/2 · log(1/2))

+ (dark) 6/8 · (−1/2 · log(1/2)− 1/2 · log(1/2))

= 1

E (X , height) = (tall) 2/8 · (−1 · log(1)− 0 · log(0))

+ (medium) 4/8 · (−1/2 · log(1/2)− 1/2 · log(1/2))

+ (small) 2/8 · (0 · log(0)− 1 · log(1))

= 0,5



Example (3)

We can see that the attribute eyes is the one that maximizes the function.

G (X , eyes) = 1− 0,344 = 0,656 ∗G (X , hair) = 1− 1 = 0

G (X , height) = 1− 0,5 = 0,5



Example (4)

This attributes generates the first level of the tree

1,2,8

+

3,6

−

4,7

−

5

+

BROWN

BLUEGREEN

EYES



Example (5)

Now only in the node corresponding to the value green we have a mix ofclasses, so we repeat the process with these examples.

Ex. Hair Height Class

4 Dark Medium −5 Dark Tall +7 Blonde Small −



Example (6)

I (X ,C ) = −1/3 · log(1/3)− 2/3 · log(2/3) = 0,918

E (X , hair) = (blonde) 1/3 · (0 · log(0)− 1 · log(1))

+ (dark) 2/3 · (−1/2 · log(1/2)− 1/2 · log(1/2))

= 0,666

E (X , height) = (tall) 1/3 · (0log(0)− 1 · log(1))

+ (medium) 1/3 · (−1 · log(1)− 0 · log(0))

+ (small) 1/3 · (0 · log(0)− 1 · log(1))

= 0



Example (7)

Now the attribute with the maximum value is eyes.

G (X , hair) = 0,918− 0,666 = 0,252

G (X , height) = 0,918− 0 = 0,918∗



Example (8)

The resulting tree totally is discriminant.

4

−

7

−

5

+

1,2,8

+

3,6

−

EYES

BLUE

BROWNGREEN

HEIGHT

TALL MEDIUM SMALL


Documents

Machine Learning Introduction Machine learningbejar/ia/transpas/teoria.mti/6-AP-aprendizaj… · Machine Learning Introduction Types of machine learning Inductive Learning: Models