View
218
Download
2
Tags:
Embed Size (px)
Citation preview
1
Inductive Learning of Rules
Mushroom Edible?Spores Spots Color
Y N Brown NY Y Grey YN Y Black YN N Brown NY N White NY Y Brown Y
Y N BrownN N Red
Don’t try this at home...
2
Types of Learning
What is learning? Improved performance over time/experience Increased knowledge
Speedup learning No change to set of theoretically inferable facts Change to speed with which agent can infer them
Inductive learning More facts can be inferred
3
Mature Technology
Many Applications Detect fraudulent credit card transactions Information filtering systems that learn user
preferences Autonomous vehicles that drive public highways
(ALVINN) Decision trees for diagnosing heart attacks Speech synthesis (correct pronunciation) (NETtalk)
Data mining: huge datasets, scaling issues
4
Defining a Learning Problem
Experience:Task:Performance Measure:
A program is said to learn from experience E with respect to task T and performance measure P, if it’s performance at tasks in T, as measured by P, improves with experience E.
5
Example: Checkers
Task T: Playing checkers
Performance Measure P: Percent of games won against
opponentsExperience E:
Playing practice games against itself
6
Example: Handwriting RecognitionTask T:
Performance Measure P:
Experience E:
Recognizing and classifying handwritten words within images
7
Example: Robot Driving
Task T:
Performance Measure P:
Experience E:
Driving on a public four-lane highway using vision sensors
8
Example: Speech Recognition
Task T:
Performance Measure P:
Experience E:
Identification of a word sequence from audio recorded from arbitrary speakers ... noise
9
Issues
What feedback (experience) is available?What kind of knowledge is being
increased? How is that knowledge represented?What prior information is available? What is the right learning algorithm?How avoid overfitting?
10
Choosing the Training ExperienceCredit assignment problem:
Direct training examples: E.g. individual checker boards + correct move for
each Indirect training examples :
E.g. complete sequence of moves and final resultWhich examples:
Random, teacher chooses, learner chooses
Supervised learningReinforcement learningUnsupervised learning
11
Choosing the Target Function
What type of knowledge will be learned?How will the knowledge be used by the
performance program?E.g. checkers program
Assume it knows legal moves Needs to choose best move So learn function: F: Boards -> Moves
hard to learn
Alternative: F: Boards -> R
12
The Ideal Evaluation Function
V(b) = 100 if b is a final, won board V(b) = -100 if b is a final, lost boardV(b) = 0 if b is a final, drawn boardOtherwise, if b is not final
V(b) = V(s) where s is best, reachable final board
Nonoperational…Want operational approximation of V: V
13
How Represent Target Function
x1 = number of black pieces on the board
x2 = number of red pieces on the board
x3 = number of black kings on the board
x4 = number of red kings on the board
x5 = number of black pieces threatened by red
x6 = number of red pieces threatened by black
V(b) = a + bx1 + cx2 + dx3 + ex4 + fx5 + gx6
Now just need to learn 7 numbers!
14
Target Function
Profound Formulation: Can express any type of inductive learning as
approximating a functionE.g., Checkers
V: boards -> evaluation E.g., Handwriting recognition
V: image -> wordE.g., Mushrooms
V: mushroom-attributes -> {E, P}Inductive bias
16
Theory of Inductive Learning
Suppose our examples are drawn with a probability distribution Pr(x), and that we learned a hypothesis f to describe a concept C.
We can define Error(f) to be:
where D are the set of all examples on which f and C disagree.
Dx
x)Pr(
17
PAC LearningWe’re not perfect (in more than one way). So why
should our programs be perfect?What we want is:
Error(f) < for some chosen But sometimes, we’re completely clueless: (hopefully,
with low probability). What we really want is: Prob ( Error(f) < .
As the number of examples grows, and should decrease.
We call this Probably approximately correct.
18
Definition of PAC Learnability
Let C be a class of concepts.We say that C is PAC learnable by a hypothesis space H
if: there is a polynomial-time algorithm A, a polynomial function p, such that for every C in C, every probability distribution Pr,
and and , if A is given at least p(1/, 1/) examples, then A returns with probability 1- a hypothesis whose error
is less than .k-DNF, and k-CNF are PAC learnable.
19
Version Spaces: A Learning Alg.
Key idea: Maintain most specific and most general hypotheses at
every point. Update them as examples come in.We describe objects in the space by attributes:
faculty, staff, student 20’s, 30’s, 40’s. male, female
Concepts: boolean combination of attribute-values: faculty, 30’s, male, female, 20’s.
20
Generalization and Specializ...
A concept C1 is more general than C2 if it describes a superset of the objects: C1={20’s, faculty} is more general than C2={20’s,
faculty, female}. C2 is a specialization of C1.
Immediate specializations (generalizations).The version space algorithm maintains the
most specific and most general boundaries at every point of the learning.