42
Concept Learning Mitchell, Chapter 2 CptS 570 Machine Learning School of EECS Washington State University

Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Concept Learning Mitchell, Chapter 2

CptS

570 Machine LearningSchool of EECS

Washington State University

Page 2: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Outline

DefinitionGeneral-to-specific ordering over hypothesesVersion spaces and the candidate elimination algorithmInductive bias

Page 3: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Concept Learning

DefinitionInferring a boolean-valued function from training examples of its input and output.

ExampleConcept:

Training examples:

321 xxxf ∨=x1 x2 x3 f

0 0 0 0

0 0 1 0

0 1 0 1

0 1 1 0

1 0 0 1

1 0 1 1

1 1 0 1

1 1 1 1

Page 4: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Example: Enjoy Sport

Learn a concept for predicting whether you will enjoy a sport based on the weatherTraining examples

What is the general concept?

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same Yes

2 Sunny Warm High Strong Warm Same Yes

3 Rainy Cold High Strong Warm Change No

4 Sunny Warm High Strong Cool Change Yes

Page 5: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Learning Task: Enjoy Sport

Task TAccurately predict enjoyment

Performance PPredictive accuracy

Experience ETraining examples each with attribute values and class value (yes or no)

Page 6: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Representing HypothesesMany possible representationsLet hypothesis h be a conjunction of constraints on attributes

Hypothesis space H is the set of all possible hypotheses hEach constraint can be

Specific value (e.g., Water = Warm)Don’t care (e.g., Water = ?)No value is acceptable (e.g., Water = Ø)

For example<Sunny, ?, ?, Strong, ?, Same>I.e., if (Sky=Sunny) and (Wind=Strong) and (Forecast=Same), then EnjoySport=Yes

Page 7: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Concept Learning TaskGiven

Instances X: Possible daysEach described by the attributes: Sky, AirTemp, Humidity, Wind, Water, Forecast

Target function c: EnjoySport {0,1}Hypotheses H: Conjunctions of literals

E.g., <?, Cold, High, ?, ?, ?>Training examples D

Positive and negative examples of the target function<x1,c(x1)>, …, <xm,c(xm)>

DetermineA hypothesis h in H such that h(x) = c(x) for all x in D

Page 8: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

TerminologyInstances or instance space X

Set of all possible input itemsE.g., x = <Sunny, Warm, Normal, Strong, Warm, Same>|X| = 3*2*2*2*2*2 = 96

Target concept c : X {0,1}Concept or function to be learnedE.g., c(x)=1 if EnjoySport=yes, c(x)=0 if EnjoySport=no

Training examples D = {<x, c(x)>}, x∈ XPositive examples, c(x) = 1, members of target conceptNegative examples, c(x) = 0, non-members of target concept

Page 9: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

TerminologyHypothesis space H

Set of all possible hypothesesDepends on choice of representationE.g., conjunctive concepts for EnjoySport

(5*4*4*4*4*4) = 5120 syntactically distinct hypotheses(4*3*3*3*3*3) + 1 = 973 semantically distinct hypothesesAny hypothesis with Ø classifies all examples as negative

Want h∈ H such that h(x) = c(x) for all x∈ XMost general hypothesis

<?,?,?,?,?,?>Most specific hypothesis

<Ø, Ø, Ø, Ø, Ø, Ø>

Page 10: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Terminology

Inductive learning hypothesisAny hypothesis approximating the target concept well, over a sufficiently large set of training examples, will also approximate the target concept well for unobserved examples

Page 11: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Concept Learning as Search

Learning viewed as a search through hypothesis space H for a hypothesis consistent with the training examplesGeneral-to-specific ordering of hypotheses

Allows more directed search of H

Page 12: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

General-to-Specific Ordering of Hypotheses

Page 13: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

General-to-Specific Ordering of Hypotheses

Hypothesis h1 is more general than or equal to hypothesis h2 iff ∀x ∈ X, h1(x)=1 ← h2(x)=1Written h1 ≥g h2

h1 strictly more general than h2 (h1 <g h2) when h1 ≥g h2 and h2 ≥g h1

Also implies h2 ≥g h1, h2 more specific than h1

Defines partial order over H

Page 14: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Finding Maximally-Specific Hypothesis

Find the most specific hypothesis covering all positive examplesHypothesis h covers positive example xif h(x) = 1Find-S algorithm

Page 15: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Find-S Algorithm

Initialize h to the most specific hypothesis in HFor each positive training instance x

For each attribute constraint ai in hIf the constraint ai in h is satisfied by xThen do nothingElse replace ai in h by the next more general constraint that is satisfied by x

Output hypothesis h

Page 16: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Find-S Example

Page 17: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Find-S Algorithm

Will h ever cover a negative example?No, if c ∈ H and training examples consistent

Problems with Find-SCannot tell if converged on target concept Why prefer the most specific hypothesis?Handling inconsistent training examples due to errors or noiseWhat if more than one maximally-specific consistent hypothesis?

Page 18: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Version Spaces

Hypothesis h is consistent with training examples D iff h(x) = c(x) for all <x,c(x)> ∈ DVersion space is all hypotheses in Hconsistent with D

VSH,D = {h ∈ H | consistent(h, D)}

Page 19: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Representing Version SpacesThe general boundary G of version space VSH,D is the set of its maximally general membersThe specific boundary S of version space VSH,D is the set of its maximally specific membersEvery member of the version space lies in or between these members

“Between” means more specific than G and more general than SThm. 2.1. Version space representation theorem

So, version space can be represented by just G and S

Page 20: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Version Space Example

Version space resulting from previous four EnjoySport

examples.

Page 21: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Finding the Version Space

List-Then-EliminateVS = list of every hypothesis in HFor each training example <x,c(x)> ∈ D

Remove from VS any h where h(x) ≠ c(x)

Return VS

Impractical for all but most trivial H’s

Page 22: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Candidate Elimination Algorithm

Initialize G to the set of maximally general hypotheses in HInitialize S to the set of maximally specific hypotheses in HFor each training example d, do

If d is a positive example …If d is a negative example …

Page 23: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Candidate Elimination Algorithm

If d is a positive exampleRemove from G any hypothesis inconsistent with dFor each hypothesis s in S that is not consistent with d

Remove s from SAdd to S all minimal generalizations h of s such that

h is consistent with d, and some member of G is more general than h

Remove from S any hypothesis that is more general than another hypothesis in S

Page 24: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Candidate Elimination Algorithm

If d is a negative exampleRemove from S any hypothesis inconsistent with dFor each hypothesis g in G that is not consistent with d

Remove g from GAdd to G all minimal specializations h of g such that

h is consistent with d, and some member of S is more specific than h

Remove from G any hypothesis that is less general than another hypothesis in G

Page 25: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Example

Page 26: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Example (cont.)

Page 27: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Example (cont.)

Page 28: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Example (cont.)

Page 29: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Version Spaces and the Candidate Elimination Algorithm

Will CE converge to correct hypothesis?Yes, if no errors and target concept in HConvergence: S = G = {hfinal}Otherwise, eventually S = G = {}

Final VS independent of training sequenceG can grow exponentially in |D|, even for conjunctive H

Page 30: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Version Spaces and the Candidate Elimination Algorithm

Which training example requested next?Learner may query oracle for example’s classificationIdeally, choose example eliminating half of VS

Need log2|VS| examples to converge

Page 31: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Which Training Example Next?

<Sunny, Cold, Normal, Strong, Cool, Change> ?<Sunny, Warm, High, Light, Cool, Change> ?

Page 32: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Using VS to Classify New Example

<Sunny, Warm, Normal, Strong, Cool, Change> ?<Rainy, Cold, Normal, Light, Warm, Same> ?<Sunny, Warm, Normal, Light, Warm, Same> ?<Sunny, Cold, Normal, Strong, Warm, Same> ?

Page 33: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Using VS to Classify New Example

How to use partially learned conceptsI.e., |VS| > 1

If all of S predict positive, then positiveIf all of G predict negative, then negativeIf half and half, then don’t knowIf majority of hypotheses in VS say positive (negative), then positive (negative) with some confidence

Page 34: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Inductive Bias

How does the choice for H affect learning performance?Biased hypothesis space

EnjoySport H cannot learn constraint [Sky = Sunny or Cloudy]How about H = every possible hypothesis?

Page 35: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Unbiased Learner

H = every teachable concept (power set of X)

E.g., EnjoySport | H | = 296 = 1028 (only 973 by previous H, biased!)

H’ = arbitrary conjunctions, disjunctions or negations of hypotheses from previous H

E.g., [Sky = Sunny or Cloudy] <Sunny,?,?,?,?,?> or <Cloudy,?,?,?,?,?>

Page 36: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Unbiased Learner

Problems using H’S = disjunction of positive examplesG = negated disjunction of negative examplesThus, no generalizationEach unseen instance covered by exactly half of VS

Page 37: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Unbiased Learner

Bias-free learning is futileFundamental property of inductive learning

Learners that make no a priori assumptions about the target concept have no rational basis for classifying unseen instances

Page 38: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Inductive BiasInformally

Any preference on the space of all possible hypotheses other than consistency with training examples

FormallySet of assumptions B such that the classification of an unseen instance x by a learner L on training data D can be inferred deductively

E.g., inductive bias for CE:B = {(c ∈ H)}Classification only by unanimous decision of VS

Page 39: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Inductive Bias

Page 40: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Inductive Bias

Permits comparison of learnersRote learner

Store examples; classify x iff matches previously observed exampleNo bias

CEc ∈ H

Find-Sc ∈ Hc(x) = 0 for all instances not covered

Page 41: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

WEKA’s

ConjunctiveRule Classifier

Learns rule of the formIf A1 and A2 and … An, Then class = cA’s are inequality constraints on attributesA’s chosen based on information gain criterion

I.e., which constraint, when added, best improves classification

Lastly, performs reduced-error pruningRemove A’s from rule as long as reduces error on pruning set

If instance x not covered by rule, then c(x) = majority class of training examples not covered by ruleInductive bias?

Page 42: Concept Learning Mitchell, Chapter 2holder/courses/CptS570/fall07/slides/ch2.pdf · The general boundary G of version space VS H,D is the set of its maximally general members The

Summary

Concept learning as searchGeneral-to-specific orderingVersion spacesCandidate elimination algorithmS and G boundary sets characterize learner’s uncertaintyLearner can generate useful queriesInductive leaps possible only if learner is biased