52
Buenos Aires, abril de 2016 Eduardo Poggi http://www.shutterstock.com/

Poggi analytics - concepts - 1a

Embed Size (px)

Citation preview

Page 1: Poggi   analytics - concepts - 1a

Buenos Aires, abril de 2016Eduardo Poggi

http://www.shutterstock.com/

Page 2: Poggi   analytics - concepts - 1a

Concept Learning

Definitions Search Space and General-Specific Ordering Concept learning as search FIND-S The Candidate Elimination Algorithm Inductive Bias

Page 3: Poggi   analytics - concepts - 1a

First definition

The problem is to learn a function mapping examples into two classes: positive and negative.

We are given a database of examples already classified as positive or negative.

Concept learning: the process of inducing a function mapping input examples into a Boolean output.

Page 4: Poggi   analytics - concepts - 1a

Notation

Set of instances X Target concept c : X {+,-} Training examples E = {(x , c(x))} Data set D X Set of possible hypotheses H h H / h : X {+,-} Goal: Find h / h(x)=c(x)

Page 5: Poggi   analytics - concepts - 1a

Representation of Examples

Features:• color {red, brown, gray}• size {small, large}• shape {round,elongated}• land {humid,dry}• air humidity {low,high}• texture {smooth, rough}

Page 6: Poggi   analytics - concepts - 1a

The Input and Output Space

XOnly a small subset is contained

in our database.

Y = {+,-}

X : The space of all possible examples (input space). Y: The space of classes (output space).An example in X is a feature vector X.For instance: X = (red,small,elongated,humid,low,rough) X is the cross product of all feature values.

Page 7: Poggi   analytics - concepts - 1a

The Training Examples

D: The set of training examples. D is a set of pairs { (x,c(x)) }, where c is the target concept

Example of D:((red,small,round,humid,low,smooth), +)((red,small,elongated,humid,low,smooth),+)((gray,large,elongated,humid,low,rough), -)((red,small,elongated,humid,high,rough), +)

Instances from the input space Instances fromthe output space

Page 8: Poggi   analytics - concepts - 1a

Hypothesis Representation

Consider the following hypotheses: (*,*,*,*,*,*): all mushrooms are poisonous (0,0,0,0,0,0): no mushroom is poisonous

Special symbols: * Any value is acceptable 0 no value is acceptable

Any hypothesis h is a function from X to Y h: X Y

We will explore the space of conjunctions.

Page 9: Poggi   analytics - concepts - 1a

Hypothesis Space

The space of all hypotheses is represented by H

Let h be a hypothesis in H. Let X be an example of a mushroom. if h(X) = + then X is poisonous,

otherwise X is not-poisonous Our goal is to find the hypothesis, h*, that is very “close” to target concept c. A hypothesis is said to “cover” those examples it classifies as positive.

X

h

Page 10: Poggi   analytics - concepts - 1a

Assumption 1

We will explore the space of all conjunctions.We assume the target concept falls within this space.

Target concept c

H

Page 11: Poggi   analytics - concepts - 1a

Assumption 2

A hypothesis close to target concept c obtained afterseeing many training examples will result in high accuracy on the set of unobserved examples.

Training set D

Hypothesis h* is good

Complement set D’

Hypothesis h* is good

Page 12: Poggi   analytics - concepts - 1a

Concept Learning as Search

There is a general to specific ordering inherent to any hypothesis space.

Consider these two hypotheses:

h1 = (red,*,*,humid,*,*)h2 = (red,*,*,*,*,*)

We say h2 is more general than h1 because h2 classifies more instances than h1 and h1 is covered by h2.

Page 13: Poggi   analytics - concepts - 1a

General-Specific

For example, consider the following hypotheses:

h1

h2 h3

h1 is more general than h2 and h3. h2 and h3 are neither more specific nor more general than each other.

Page 14: Poggi   analytics - concepts - 1a

Let hj and hk be two hypotheses mapping examples into {+,-}.We say hj is more general than hk iff

For all examples X, hk(X) = + hj(X) = +

We represent this fact as hj >= hk

The >= relation imposes a partial ordering over the hypothesis space H (reflexive, antisymmetric, and transitive).

Definition

Page 15: Poggi   analytics - concepts - 1a

Lattice

Any input space X defines then a lattice of hypotheses orderedaccording to the general-specific relation:

h1

h3 h4

h2

h5 h6

h7 h8

Page 16: Poggi   analytics - concepts - 1a

Working Example: Mushrooms

Class of Tasks: Predicting poisonous mushroomsPerformance: Accuracy of ClassificationExperience: Database describing mushrooms with their class

Knowledge to learn: Function mapping mushrooms to {+,-}

where -:not-poisonous and +:poisonousRepresentation of target knowledge:

conjunction of attribute values.Learning mechanism:

Find-S

Page 17: Poggi   analytics - concepts - 1a

Finding a Maximally-Specific Hypothesis

Algorithm to search the space of conjunctions: Start with the most specific hypothesis Generalize the hypothesis when it fails to cover a positive

example

Algorithm:1. Initialize h to the most specific hypothesis2. For each positive training example X For each value a in h

If example X and h agree on a, do nothing else generalize a by the next more general constraint3. Output hypothesis h

Page 18: Poggi   analytics - concepts - 1a

Example

Let’s run the learning algorithm above with the following examples:((red,small,round,humid,low,smooth), +)((red,small,elongated,humid,low,smooth),+)((gray,large,elongated,humid,low,rough), -)((red,small,elongated,humid,high,rough), +)

We start with the most specific hypothesis:h = (0,0,0,0,0,0)

The first example comes and since the example is positive and h fails to cover it, we simply generalize h to cover exactly this example: h = (red,small,round,humid,low,smooth)

Page 19: Poggi   analytics - concepts - 1a

Example

Hypothesis h basically says that the first example is the onlypositive example, all other examples are negative.

Then comes examples 2: ((red,small,elongated,humid,low,smooth), poisonous)

This example is positive. All attributes match hypothesis h except for attribute shape: it has the value elongated, not round.

We generalize this attribute using symbol * yielding:h: (red,small,*,humid,low,smooth)The third example is negative and so we just ignore it. Why is it we don’t need to be concerned with negative

examples?

Page 20: Poggi   analytics - concepts - 1a

Example

Upon observing the 4th example, hypothesis h is generalized to the following:

h = (red,small,*,humid,*,*)

h is interpreted as any mushroom that is red, small and found on humid land should be classified as poisonous.

Page 21: Poggi   analytics - concepts - 1a

Analyzing the Algorithm

• The algorithm is guaranteed to find the hypothesis that is most specific and consistent with the set of training examples.

• It takes advantage of the general-specific ordering to move on the corresponding lattice searching for the next most specific hypothesis.

h1

h3 h4

h2

h5 h6

h7 h8

Page 22: Poggi   analytics - concepts - 1a

X-H Relation

Page 23: Poggi   analytics - concepts - 1a

X-H Relation

Page 24: Poggi   analytics - concepts - 1a

Points to Consider

There are many hypotheses consistent with the training data D.

Why should we prefer the most specific hypothesis? What would happen if the examples are not consistent? What would happen if they have errors, noise? What if there is a hypothesis space H where one can find more

that one maximally specific hypothesis h? The search over the lattice must then be different to allow for

this possibility.

Page 25: Poggi   analytics - concepts - 1a

Summary FIND-S

The input space is the space of all examples; the output space is the space of all classes.

A hypothesis maps examples into classes. We want a hypothesis close to target concept c. The input space establishes a partial ordering over the

hypothesis space. One can exploit this ordering to move along the

corresponding lattice.

Page 26: Poggi   analytics - concepts - 1a

Working Example: Mushrooms

Class of Tasks: Predicting poisonous mushroomsPerformance: Accuracy of ClassificationExperience: Database describing mushrooms with their class

Knowledge to learn: Function mapping mushrooms to {+,-}

where -:not-poisonous and +:poisonousRepresentation of target knowledge:

conjunction of attribute values.Learning mechanism:

candidate-elimination

Page 27: Poggi   analytics - concepts - 1a

Candidate Elimination

The algorithm that finds the maximally specific hypothesis is limited in that it only finds one of many hypotheses consistent with the training data.

The Candidate Elimination Algorithm (CEA) finds ALL hypotheses consistent with the training data.

CEA does that without explicitly enumerating all consistent hypotheses.

Page 28: Poggi   analytics - concepts - 1a

Consistency vs Coverage

h1

h2

h1 covers a different set of examples than h2h2 is consistent with training set Dh1 is not consistent with training set D

Positive examplesNegative examples

Training set D

-

-

-

-++

+++ +

+

Page 29: Poggi   analytics - concepts - 1a

Version Space VS

Hypothesis space H

Version space: Subset of hypothesis from H consistent with training set D.

Page 30: Poggi   analytics - concepts - 1a

List-Then-Eliminate Algorithm

Algorithm:

1. Version Space VS: All hypotheses in H2. For each training example X Remove every hypothesis h in H inconsistent

with X: h(x) = c(x)

3. Output the version space VS

Comments: This is unfeasible. The size of H is unmanageable.

Page 31: Poggi   analytics - concepts - 1a

Previous Exercise: Mushrooms

Let’s remember our exercise in which we tried to classifymushrooms as poisonous (+) or not-poisonous (-).

Training set D:

((red,small,round,humid,low,smooth), +)((red,small,elongated,humid,low,smooth), +)((gray,large,elongated,humid,low,rough), -)((red,small,elongated,humid,high,rough), +)

Page 32: Poggi   analytics - concepts - 1a

Consistent Hypotheses

Our first algorithm found only one out of six consistent hypotheses:

(red,small,*,humid,*,*)

(*,small,*,humid,*,*)(red,*,*,humid,*,*) (red,small,*,*,*,*)

(red,*,*,*,*,*) (*,small,*,*,*,*)G:

S:S: Most specificG: Most general

Page 33: Poggi   analytics - concepts - 1a

Candidate-Elimination Algorithm

(red,small,*,humid,*,*)

(red,*,*,*,*,*)(*,small,*,*,*,*)G:

S:

The candidate elimination algorithm keeps two listsof hypotheses consistent with the training data:The list of most specific hypotheses S andThe list of most general hypotheses G

This is enough to derive the whole version space VS.

VS

Page 34: Poggi   analytics - concepts - 1a

Candidate-Elimination Algorithm

• Initialize G to the set of maximally general hypotheses in H• Initialize S to the set of maximally specific hypotheses in H• For each training example X do

• If X is positive: generalize S if necessary• If X is negative: specialize G if necessary

• Output {G,S}

Page 35: Poggi   analytics - concepts - 1a

Candidate-Elimination Algorithm

Initialize G to the set of maximally general hypotheses in H Initialize S to the set of maximally specific hypotheses in H For each training example d, do

If d+ Remove from G any hypothesis inconsistent with d For each hypothesis s in S that is not consistent with d

Remove s from S Add to S all minimal generalizations h of s such that h is

consistent with d and some member of G is more general than h Remove from S any hipothesis that is more general than

another hypothesis in S If d-

Remove from S any hypothesis inconsistent with d For each hypothesis g in G that is not consistent with d

Remove g from G Add to G all minimal specializations h of g such that h is

consistent with d and some member of S is more general than h Remove from G any hipothesis that is less general than another

hypothesis in G

Page 36: Poggi   analytics - concepts - 1a

Positive Examples

a) If X is positive: Remove from G any hypothesis inconsistent with X For each hypothesis h in S not consistent with X

Remove h from S Add all minimal generalizations of h consistent with X

such that some member of G is more general than h Remove from S any hypothesis more general than any other hypothesis in S

G:

S:

hinconsistent

add minimal generalizations

Page 37: Poggi   analytics - concepts - 1a

Negative Examples

b) If X is negative:Remove from S any hypothesis inconsistent with XFor each hypothesis h in G not consistent with X

Remove g from GAdd all minimal generalizations of h consistent with X

such that some member of S is more specific than hRemove from G any hypothesis less general than any other

hypothesis in G

G:

S: h inconsistent

add minimal specializations

Page 38: Poggi   analytics - concepts - 1a

An Exercise

Initialize the S and G sets:

S: (0,0,0,0,0,0)G: (*,*,*,*,*,*)

Let’s look at the first two examples:

((red,small,round,humid,low,smooth), +)((red,small,elongated,humid,low,smooth), +)

Page 39: Poggi   analytics - concepts - 1a

An Exercise: two positives

The first two examples are positive:((red,small,round,humid,low,smooth), +)((red,small,elongated,humid,low,smooth), +)

S: (0,0,0,0,0,0)

(red,small,round,humid,low,smooth)

(red,small,*,humid,low,smooth)

G: (*,*,*,*,*,*)

generalize

specialize

Page 40: Poggi   analytics - concepts - 1a

An Exercise: first negative

The third example is a negative example:((gray,large,elongated,humid,low,rough), -)

S:(red,small,*,humid,low,smooth)

G: (*,*,*,*,*,*)

generalize

specialize

(red,*,*,*,*,*,*) (*,small,*,*,*,*) (*,*,*,*,*,smooth)

Why is (*,*,round,*,*,*) not a valid specialization of G

Page 41: Poggi   analytics - concepts - 1a

An Exercise: another positive

The fourth example is a positive example:((red,small,elongated,humid,high,rough), +)

S:(red,small,*,humid,low,smooth)generalize

specializeG: (red,*,*,*,*,*,*) (*,small,*,*,*,*) (*,*,*,*,*,smooth)

(red,small,*,humid,*,*)

Page 42: Poggi   analytics - concepts - 1a

The Learned Version Space VS

G: (red,*,*,*,*,*,*) (*,small,*,*,*,*)

S: (red,small,*,humid,*,*)

(red,*,*,humid,*,*) (red,small,*,*,*,*) (*,small,*,humid,*,*)

Page 43: Poggi   analytics - concepts - 1a

Points to Consider

Will the algorithm converge to the right hypothesis? The algorithm is guaranteed to converge to the right

hypothesis provided the following: No errors exist in the examples The target concept is included in the hypothesis space H

What happens if there exists errors in the examples? The right hypothesis would be inconsistent and thus eliminated. If the S and G sets converge to an empty space we have

evidence that the true concept lies outside space H.

Page 44: Poggi   analytics - concepts - 1a

Query Learning

Remember the version space VS after seeing our 4 exampleson the mushroom database:

G: (red,*,*,*,*,*,*) (*,small,*,*,*,*)

S: (red,small,*,humid,*,*)

(red,*,*,humid,*,*) (red,small,*,*,*,*) (*,small,*,humid,*,*)

What would be a good question to pose to the algorithm? What example is best next?

Page 45: Poggi   analytics - concepts - 1a

Query Learning

Remember there are three settings for learning: Tasks are generated by a random process outside the learner The learner can pose queries to a teacher The learner explores its surroundings autonomously

Here we focus on the second setting; posing queries to an expert.

Version space strategy: Ask about the class of an example that would prune half of the space.

Example: (red,small,round,dry,low,smooth)

Page 46: Poggi   analytics - concepts - 1a

Query Learning

In general if we are able to prune the version space by half on each new query then we can find an optimal hypothesis in the following

Number of steps: log2 |VS|

Can you explain why?

Page 47: Poggi   analytics - concepts - 1a

Classifying Examples

What if the version space VS has not collapsed into a single hypothesis and we are asked to classify a new instance?

Suppose all hypotheses in set S agree that the instance is positive.

Then we are sure that all hypotheses in VS agree the instance is positive. Why?

The same can be said if the instance is negative by all members of set G. Why?

In general one can vote over all hypotheses in VS if there is no unanimous agreement.

Page 48: Poggi   analytics - concepts - 1a

Inductive Bias

Inductive bias is the preference for a hypothesis space H and a search mechanism over H.

What would happen if we choose an H that contains all possible hypotheses?

What would the size of H be? |H| = Size of the power set of the input space X.

Example: You have n Boolean features. |X| = 2n And the size of H is 2^2^n

Page 49: Poggi   analytics - concepts - 1a

Inductive Bias

In this case, the candidate elimination algorithm would simply classify as positive the training examples it has seen. This is because H is so large, every possible hypothesis is contained within it.

A Property of any Inductive Algorithm:It must have some embedded assumptions about the nature of H.

Without assumptions learning is impossible.

Page 50: Poggi   analytics - concepts - 1a

Summary

The candidate elimination algorithm exploits the general-specific ordering of hypotheses to find all hypotheses consistent with the training data.

The version space contains all consistent hypotheses and is simply represented by two lists: S and G.

Candidate elimination algorithm is not robust to noise and assumes the target concept is included in the hypothesis space.

Any inductive algorithm needs some assumptions about the hypothesis space, otherwise it would be impossible to perform predictions.

Page 51: Poggi   analytics - concepts - 1a

[email protected]

eduardo-poggi

http://ar.linkedin.com/in/eduardoapoggi

https://www.facebook.com/eduardo.poggi

@eduardoapoggi

Page 52: Poggi   analytics - concepts - 1a

Bibliografía

Capítulo 2 de Mitchell