Upload
bonnie-moore
View
212
Download
0
Embed Size (px)
Citation preview
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Lecture 34 of 42
Wednesday, 19 November 2008
William H. Hsu
Department of Computing and Information Sciences, KSU
KSOL course page: http://snipurl.com/v9v3
Course web site: http://www.kddresearch.org/Courses/Fall-2008/CIS730
Instructor home page: http://www.cis.ksu.edu/~bhsu
Reading for Next Class:
Sections 22.1, 22.6-7, Russell & Norvig 2nd edition
Genetic and Evolutionary ComputationDiscussion: GA, GP
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Hidden Units and Feature Extraction
Training procedure: hidden unit representations that minimize error E
Sometimes backprop will define new hidden features that are not explicit in the
input representation x, but which capture properties of the input instances that
are most relevant to learning the target function t(x)
Hidden units express newly constructed features
Change of representation to linearly separable D’
A Target Function (Sparse aka 1-of-C, Coding)
Can this be learned? (Why or why not?)
Learning Hidden Layer Representations
Input Hidden Values Output1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
Input Hidden Values Output1 0 0 0 0 0 0 0 0.89 0.04 0.08 1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0.01 0.11 0.88 0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0.01 0.97 0.27 0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0 0.99 0.97 0.71 0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0 0.03 0.05 0.02 0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0 0.22 0.99 0.99 0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0 0.80 0.01 0.98 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1 0.60 0.94 0.01 0 0 0 0 0 0 0 1
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Training: Evolution of Error and Hidden Unit
Encoding
errorD(ok)
hj(01000000), 1 j 3
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Input-to-Hidden Unit Weights and Feature Extraction
Changes in first weight layer values correspond to changes in hidden layer
encoding and consequent output squared errors
w0 (bias weight, analogue of threshold in LTU) converges to a value near 0
Several changes in first 1000 epochs (different encodings)
Training:Weight Evolution
ui1, 1 i 8
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Convergence of Backpropagation
No Guarantee of Convergence to Global Optimum Solution
Compare: perceptron convergence (to best h H, provided h H; i.e., LS)
Gradient descent to some local error minimum (perhaps not global minimum…)
Possible improvements on backprop (BP)
• Momentum term (BP variant with slightly different weight update rule)
• Stochastic gradient descent (BP algorithm variant)
• Train multiple nets with different initial weights; find a good mixture
Improvements on feedforward networks
• Bayesian learning for ANNs (e.g., simulated annealing) - later
• Other global optimization methods that integrate over multiple networks
Nature of Convergence
Initialize weights near zero
Therefore, initial network near-linear
Increasingly non-linear functions possible as training progresses
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Overtraining in ANNs
Error versus epochs (Example 2)
Recall: Definition of Overfitting h’ worse than h on Dtrain, better on Dtest
Overtraining: A Type of Overfitting Due to excessive iterations
Avoidance: stopping criterion(cross-validation: holdout, k-fold)
Avoidance: weight decay
Error versus epochs (Example 1)
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Overfitting in ANNs
Other Causes of Overfitting Possible Number of hidden units sometimes set in advance
Too few hidden units (“underfitting”)• ANNs with no growth
• Analogy: underdetermined linear system of equations (more unknowns than equations)
Too many hidden units
• ANNs with no pruning
• Analogy: fitting a quadratic polynomial with an approximator of degree >> 2
Solution Approaches Prevention: attribute subset selection (using pre-filter or wrapper)
Avoidance
• Hold out cross-validation (CV) set or split k ways (when to stop?)
• Weight decay: decrease each weight by some factor on each epoch
Detection/recovery: random restarts, addition and deletion of weights, units
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
90% Accurate Learning Head Pose, Recognizing 1-of-20 Faces
http://www.cs.cmu.edu/~tom/faces.html
Example:Neural Nets for Face Recognition
30 x 32 Inputs
Left Straight Right Up
Hidden Layer Weights after 1 Epoch
Hidden Layer Weights after 25 Epochs
Output Layer Weights (including w0 = ) after 1 Epoch
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Example:NetTalk
Sejnowski and Rosenberg, 1987
Early Large-Scale Application of Backprop Learning to convert text to speech
• Acquired model: a mapping from letters to phonemes and stress marks
• Output passed to a speech synthesizer
Good performance after training on a vocabulary of ~1000 words
Very Sophisticated Input-Output Encoding Input: 7-letter window; determines the phoneme for the center letter and
context on each side; distributed (i.e., sparse) representation: 200 bits
Output: units for articulatory modifiers (e.g., “voiced”), stress, closest phoneme; distributed representation
40 hidden units; 10000 weights total
Experimental Results Vocabulary: trained on 1024 of 1463 (informal) and 1000 of 20000 (dictionary)
78% on informal, ~60% on dictionary
http://en.wikipedia.org/wiki/NETtalk_(artificial_neural_network)
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
NeuroSolutions Demo
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
PAC Learning:Definition and Rationale
Intuition Can’t expect a learner to learn exactly
• Multiple consistent concepts
• Unseen examples: could have any label (“OK” to mislabel if “rare”)
Can’t always approximate c closely (probability of D not being representative)
Terms Considered Class C of possible concepts, learner L, hypothesis space H
Instances X, each of length n attributes
Error parameter , confidence parameter , true error errorD(h)
size(c) = the encoding length of c, assuming some representation
Definition C is PAC-learnable by L using H if for all c C, distributions D over X, such
that 0 < < 1/2, and such that 0 < < 1/2, learner L will, with probability at
least (1 - ), output a hypothesis h H such that errorD(h)
Efficiently PAC-learnable: L runs in time polynomial in 1/, 1/, n, size(c)
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
PAC Learning:Results for Two Hypothesis
Languages Unbiased Learner
Recall: sample complexity bound m 1/ (ln | H | + ln (1/))
Sample complexity not always polynomial
Example: for unbiased learner, | H | = 2 | X |
Suppose X consists of n booleans (binary-valued attributes)
• | X | = 2n, | H | = 22n
• m 1/ (2n ln 2 + ln (1/))
• Sample complexity for this H is exponential in n
Monotone Conjunctions
Target function of the form
Active learning protocol (learner gives query instances): n examples needed
Passive learning with a helpful teacher: k examples (k literals in true concept)
Passive learning with randomly selected examples (proof to follow):
m 1/ (ln | H | + ln (1/)) = 1/ (ln n + ln (1/))
'k
'1n1 xxx, ,xfy
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
PAC Learning:Monotone Conjunctions [1]
Monotone Conjunctive Concepts
Suppose c C (and h H) is of the form x1 x2 … xm
n possible variables: either omitted or included (i.e., positive literals only)
Errors of Omission (False Negatives)
Claim: the only possible errors are false negatives (h(x) = -, c(x) = +)
Mistake iff (z h) (z c) ( x Dtest . x(z) = false): then h(x) = -, c(x) = +
Probability of False Negatives
Let z be a literal; let Pr(Z) be the probability that z is false in a positive x D
z in target concept (correct conjunction c = x1 x2 … xm) Pr(Z) = 0
Pr(Z) is the probability that a randomly chosen positive example has z = false
(inducing a potential mistake, or deleting z from h if training is still in progress)
error(h) z h Pr(Z)
ch
Instance Space X
++-
-
--+
+
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
PAC Learning: Monotone Conjunctions [2]
Bad Literals
Call a literal z bad if Pr(Z) > = ’/n
z does not belong in h, and is likely to be dropped (by appearing with value true
in a positive x D), but has not yet appeared in such an example
Case of No Bad Literals
Lemma: if there are no bad literals, then error(h) ’
Proof: error(h) z h Pr(Z) z h ’/n ’ (worst case: all n z’s are in c ~ h)
Case of Some Bad Literals
Let z be a bad literal
Survival probability (probability that it will not be eliminated by a given
example): 1 - Pr(Z) < 1 - ’/n
Survival probability over m examples: (1 - Pr(Z))m < (1 - ’/n)m
Worst case survival probability over m examples (n bad literals) = n (1 - ’/n)m
Intuition: more chance of a mistake = greater chance to learn
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
PAC Learning: Monotone Conjunctions [3]
Goal: Achieve An Upper Bound for Worst-Case Survival Probability
Choose m large enough so that probability of a bad literal z surviving across m
examples is less than
Pr(z survives m examples) = n (1 - ’/n)m <
Solve for m using inequality 1 - x < e-x
• n e-m’/n <
• m > n/’ (ln (n) + ln (1/)) examples needed to guarantee the bounds
This completes the proof of the PAC result for monotone conjunctions
Nota Bene: a specialization of m 1/ (ln | H | + ln (1/)); n/’ = 1/
Practical Ramifications
Suppose = 0.1, ’ = 0.1, n = 100: we need 6907 examples
Suppose = 0.1, ’ = 0.1, n = 10: we need only 460 examples
Suppose = 0.01, ’ = 0.1, n = 10: we need only 690 examples
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
PAC Learning:k-CNF, k-Clause-CNF, k-DNF, k-Term-DNF
k-CNF (Conjunctive Normal Form) Concepts: Efficiently PAC-Learnable Conjunctions of any number of disjunctive clauses, each with at most k literals
c = C1 C2 … Cm; Ci = l1 l1 … lk; ln (| k-CNF |) = ln (2(2n)k) = (nk)
Algorithm: reduce to learning monotone conjunctions over nk pseudo-literals Ci
k-Clause-CNF
c = C1 C2 … Ck; Ci = l1 l1 … lm; ln (| k-Clause-CNF |) = ln (3kn) = (kn)
Efficiently PAC learnable? See below (k-Clause-CNF, k-Term-DNF are duals)
k-DNF (Disjunctive Normal Form) Disjunctions of any number of conjunctive terms, each with at most k literals
c = T1 T2 … Tm; Ti = l1 l1 … lk
k-Term-DNF: “Not” Efficiently PAC-Learnable (Kind Of, Sort Of…)
c = T1 T2 … Tk; Ti = l1 l1 … lm; ln (| k-Term-DNF |) = ln (k3n) = (n + ln k)
Polynomial sample complexity, not computational complexity (unless RP = NP)
Solution: Don’t use H = C! k-Term-DNF k-CNF (so let H = k-CNF)
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Consistent Learners
General Scheme for Learning
Follows immediately from definition of consistent hypothesis
Given: a sample D of m examples
Find: some h H that is consistent with all m examples
PAC: show that if m is large enough, a consistent hypothesis must be close
enough to c
Efficient PAC (and other COLT formalisms): show that you can compute the
consistent hypothesis efficiently
Monotone Conjunctions
Used an Elimination algorithm (compare: Find-S) to find a hypothesis h that is
consistent with the training set (easy to compute)
Showed that with sufficiently many examples (polynomial in the parameters),
then h is close to c
Sample complexity gives an assurance of “convergence to criterion” for
specified m, and a necessary condition (polynomial in n) for tractability
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
VC Dimension:Framework
Infinite Hypothesis Space?
Preceding analyses were restricted to finite hypothesis spaces
Some infinite hypothesis spaces are more expressive than others, e.g.,
• rectangles vs. 17-sided convex polygons vs. general convex polygons
• linear threshold (LT) function vs. a conjunction of LT units
Need a measure of the expressiveness of an infinite H other than its size
Vapnik-Chervonenkis Dimension: VC(H)
Provides such a measure
Analogous to | H |: there are bounds for sample complexity using VC(H)
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
VC Dimension:Shattering A Set of Instances
Dichotomies
Recall: a partition of a set S is a collection of disjoint sets Si whose union is S
Definition: a dichotomy of a set S is a partition of S into two subsets S1 and S2
Shattering
A set of instances S is shattered by hypothesis space H if and only if for every dichotomy of S, there exists a hypothesis in
H consistent with this dichotomy
Intuition: a rich set of functions shatters a larger instance space
The “Shattering Game” (An Adversarial Interpretation)
Your client selects an S (an instance space X)
You select an H
Your adversary labels S (i.e., chooses a point c from concept space C = 2X)
You must find then some h H that “covers” (is consistent with) c
If you can do this for any c your adversary comes up with, H shatters S
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
VC Dimension:Examples of Shattered Sets
Three Instances Shattered
Intervals
Left-bounded intervals on the real axis: [0, a), for a R 0
• Sets of 2 points cannot be shattered
• Given 2 points, can label so that no hypothesis will be consistent
Intervals on the real axis ([a, b], b R > a R): can shatter 1 or 2 points, not 3
Half-spaces in the plane (non-collinear): 1? 2? 3? 4?
Instance Space X
0 a
- +
- +
a b
+
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Lecture Outline
Readings for Friday
Finish Chapter 20, Russell and Norvig 2e
Suggested: Chapter 1, 6.1-6.5, Goldberg; 9.1 – 9.4, Mitchell
Evolutionary Computation
Biological motivation: process of natural selection
Framework for search, optimization, and learning
Prototypical (Simple) Genetic Algorithm
Components: selection, crossover, mutation
Representing hypotheses as individuals in GAs
An Example: GA-Based Inductive Learning (GABIL)
GA Building Blocks (aka Schemas)
Taking Stock (Course Review)
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Simple Genetic Algorithm (SGA)
Algorithm Simple-Genetic-Algorithm (Fitness, Fitness-Threshold, p, r, m)
// p: population size; r: replacement rate (aka generation gap width), m: string
size
P p random hypotheses // initialize population
FOR each h in P DO f[h] Fitness(h) // evaluate Fitness: hypothesis R
WHILE (Max(f) < Fitness-Threshold) DO 1. Select: Probabilistically select (1 - r)p members of P to add to PS
2. Crossover:
Probabilistically select (r · p)/2 pairs of hypotheses from P
FOR each pair <h1, h2> DO
PS += Crossover (<h1, h2>) // PS[t+1] = PS[t] + <offspring1, offspring2>
3. Mutate: Invert a randomly selected bit in m · p random members of PS
4. Update: P PS
5. Evaluate: FOR each h in P DO f[h] Fitness(h)
RETURN the hypothesis h in P that has maximum fitness f[h]
p
1j j
ii
hf
hfhP
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
GA-Based Inductive Learning (GABIL)
GABIL System [Dejong et al, 1993]
Given: concept learning problem and examples
Learn: disjunctive set of propositional rules
Goal: results competitive with those for current decision tree learning
algorithms (e.g., C4.5)
Fitness Function: Fitness(h) = (Correct(h))2
Representation
Rules: IF a1 = T a2 = F THEN c = T; IF a2 = T THEN c = F
Bit string encoding: a1 [10] . a2 [01] . c [1] . a1 [11] . a2 [10] . c [0] = 10011 11100
Genetic Operators
Want variable-length rule sets
Want only well-formed bit string hypotheses
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Crossover:Variable-Length Bit Strings
Basic Representation Start with
a1 a2 c a1 a2 c
h1 1[0 01 1 11 1]0 0
h2 0[1 1]1 0 10 01 0
Idea: allow crossover to produce variable-length offspring
Procedure
1. Choose crossover points for h1, e.g., after bits 1, 8
2. Now restrict crossover points in h2 to those that produce bitstrings with well-
defined semantics, e.g., <1, 3>, <1, 8>, <6, 8>
Example Suppose we choose <1, 3>
Result
h3 11 10 0
h4 00 01 1 11 11 010 01 0
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
GABIL Extensions
New Genetic Operators
Applied probabilistically
1. AddAlternative: generalize constraint on ai by changing a 0 to a 1
2. DropCondition: generalize constraint on ai by changing every 0 to a 1
New Field
Add fields to bit string to decide whether to allow above operators
a1 a2 c a1 a2 c AA
DC
01 11 0 10 01 0 1
0
So now learning strategy also evolves!
aka genetic wrapper
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
GABIL Results
Classification Accuracy
Compared to symbolic rule/tree learning methods
C4.5 [Quinlan, 1993]
ID5R
AQ14 [Michalski, 1986]
Performance of GABIL comparable
Average performance on a set of 12 synthetic problems: 92.1% test
accuracy
Symbolic learning methods ranged from 91.2% to 96.6%
Effect of Generalization Operators
Result above is for GABIL without AA and DC
Average test set accuracy on 12 synthetic problems with AA and DC: 95.2%
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Building Blocks(Schemas)
Problem
How to characterize evolution of population in GA?
Goal
Identify basic building block of GAs
Describe family of individuals
Definition: Schema
String containing 0, 1, * (“don’t care”)
Typical schema: 10**0*
Instances of above schema: 101101, 100000, …
Solution Approach
Characterize population by number of instances representing each schema
m(s, t) number of instances of schema s in population at time t
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Selection and Building Blocks
Restricted Case: Selection Only
average fitness of population at time t
m(s, t) number of instances of schema s in population at time t
average fitness of instances of schema s at time t
Quantities of Interest
Probability of selecting h in one selection step
Probability of selecting an instance of s in one selection step
Expected number of instances of s after n selections
tf
t s,u
n
i ihf
hfhP
1
t s,mtfn
t s,u
tfn
hfshP
tpsh
ˆ
t s,mtf
t s,ut s,mE
ˆ1
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Schema Theorem
Theorem
m(s, t) number of instances of schema s in population at time t
average fitness of population at time t
average fitness of instances of schema s at time t
pc probability of single point crossover operator
pm probability of mutation operator
l length of individual bit strings
o(s) number of defined (non “*”) bits in s
d(s) distance between rightmost, leftmost defined bits in s
Intuitive Meaning “The expected number of instances of a schema in the population tends
toward its relative fitness”
A fundamental theorem of GA analysis and design
so
ms
c p-l
dpt s,m
tf
t s,ut s,mE 1-
11-1
ˆ
tf
t s,u
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Genetic Programming
Readings / Viewings View GP videos 1-3
GP1 – Genetic Programming: The Video
GP2 – Genetic Programming: The Next Generation
GP3 – Genetic Programming: Invention
GP4 – Genetic Programming: Human-Competitive
Suggested: Chapters 1-5, Koza
Previously Genetic and evolutionary computation (GEC)
Generational vs. steady-state GAs; relation to simulated annealing, MCMC
Schema theory and GA engineering overview
Today: GP Discussions Code bloat and potential mitigants: types, OOP, parsimony, optimization,
reuse
Genetic programming vs. human programming: similarities, differences
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
GP Flow Graph
Adapted from The Genetic Programming Notebook © 2002 Jaime J. Fernandezhttp://www.geneticprogramming.com
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Structural Crossover
Adapted from The Genetic Programming Notebook © 2002 Jaime J. Fernandezhttp://www.geneticprogramming.com
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Structural Mutation
Adapted from The Genetic Programming Notebook © 2002 Jaime J. Fernandezhttp://www.geneticprogramming.com
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Terminology
Evolutionary Computation (EC): Models Based on Natural Selection
Genetic Algorithm (GA) Concepts Individual: single entity of model (corresponds to hypothesis)
Population: collection of entities in competition for survival
Generation: single application of selection and crossover operations
Schema aka building block: descriptor of GA population (e.g., 10**0*)
Schema theorem: representation of schema proportional to its relative fitness
Simple Genetic Algorithm (SGA) Steps Selection
Proportionate (aka roulette wheel): P(individual) f(individual)
Tournament: let individuals compete in pairs or tuples; eliminate unfit ones
Crossover
Single-point: 11101001000 00001010101 { 11101010101, 00001001000 }
Two-point: 11101001000 00001010101 { 11001011000, 00101000101 }
Uniform: 11101001000 00001010101 { 10001000100, 01101011001 }
Mutation: single-point (“bit flip”), multi-point
Computing & Information SciencesKansas State University
Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence
Summary Points
Evolutionary Computation
Motivation: process of natural selection
Limited population; individuals compete for membership
Method for parallelizing and stochastic search
Framework for problem solving: search, optimization, learning
Prototypical (Simple) Genetic Algorithm (GA)
Steps
Selection: reproduce individuals probabilistically, in proportion to fitness
Crossover: generate new individuals probabilistically, from pairs of “parents”
Mutation: modify structure of individual randomly
How to represent hypotheses as individuals in GAs
An Example: GA-Based Inductive Learning (GABIL)
Schema Theorem: Propagation of Building Blocks
Next Lecture: Genetic Programming, The Movie