Upload
janice-jenkins
View
217
Download
2
Tags:
Embed Size (px)
Citation preview
CS 416Artificial Intelligence
Lecture 7Lecture 7
OptimizationOptimization
Lecture 7Lecture 7
OptimizationOptimization
Tomorrow’s Demos
TA will be at Thornton StacksTA will be at Thornton Stacks
Get your stuff set up to demoGet your stuff set up to demo
Find Chris and let him know you’re readyFind Chris and let him know you’re ready
TA will be at Thornton StacksTA will be at Thornton Stacks
Get your stuff set up to demoGet your stuff set up to demo
Find Chris and let him know you’re readyFind Chris and let him know you’re ready
Hill Climbing
Continuous spacesContinuous spaces
• Derivative tells you “downhill” directionDerivative tells you “downhill” direction
Continuous spacesContinuous spaces
• Derivative tells you “downhill” directionDerivative tells you “downhill” direction
Hillclimbing
Multidimensional continuous spacesMultidimensional continuous spaces
• Derivative computation computes a hyperplaneDerivative computation computes a hyperplane
Multidimensional continuous spacesMultidimensional continuous spaces
• Derivative computation computes a hyperplaneDerivative computation computes a hyperplane
Simulated Annealing
A term borrowed from metalworkingA term borrowed from metalworking• We want metal molecules to find a stable location relative to We want metal molecules to find a stable location relative to
neighborsneighbors
• heating causes metal molecules to jump around and to take on heating causes metal molecules to jump around and to take on undesirable (high energy) locationsundesirable (high energy) locations
• during cooling, molecules reduce their movement and settle during cooling, molecules reduce their movement and settle into a more stable (low energy) position into a more stable (low energy) position
• annealing is process of heating metal and letting it cool slowly annealing is process of heating metal and letting it cool slowly to lock in the stable locations of the moleculesto lock in the stable locations of the molecules
A term borrowed from metalworkingA term borrowed from metalworking• We want metal molecules to find a stable location relative to We want metal molecules to find a stable location relative to
neighborsneighbors
• heating causes metal molecules to jump around and to take on heating causes metal molecules to jump around and to take on undesirable (high energy) locationsundesirable (high energy) locations
• during cooling, molecules reduce their movement and settle during cooling, molecules reduce their movement and settle into a more stable (low energy) position into a more stable (low energy) position
• annealing is process of heating metal and letting it cool slowly annealing is process of heating metal and letting it cool slowly to lock in the stable locations of the moleculesto lock in the stable locations of the molecules
Simulated Annealing
““Be the Ball”Be the Ball”
• You have a wrinkled sheet of metalYou have a wrinkled sheet of metal
• Place a BB on the sheet and what happens?Place a BB on the sheet and what happens?
– BB rolls downhillBB rolls downhill
– BB stops at bottom of hill (local or global min?)BB stops at bottom of hill (local or global min?)
– BB momentum may carry it out of hill into another (local or global)BB momentum may carry it out of hill into another (local or global)
• By shaking metal sheet, your are adding energy (heat)By shaking metal sheet, your are adding energy (heat)
• How hard do you shake?How hard do you shake?
““Be the Ball”Be the Ball”
• You have a wrinkled sheet of metalYou have a wrinkled sheet of metal
• Place a BB on the sheet and what happens?Place a BB on the sheet and what happens?
– BB rolls downhillBB rolls downhill
– BB stops at bottom of hill (local or global min?)BB stops at bottom of hill (local or global min?)
– BB momentum may carry it out of hill into another (local or global)BB momentum may carry it out of hill into another (local or global)
• By shaking metal sheet, your are adding energy (heat)By shaking metal sheet, your are adding energy (heat)
• How hard do you shake?How hard do you shake?
Our Simulated Annealing Algorithm
““You’re not being the ball, Danny” You’re not being the ball, Danny” (Caddy Shack)(Caddy Shack)
• Gravity is great because it tells the ball which way is downhill Gravity is great because it tells the ball which way is downhill at all timesat all times
• We don’t have gravity, so how do we find a successor state?We don’t have gravity, so how do we find a successor state?
– RandomnessRandomness
AKA AKA Monte CarloMonte Carlo
AKA AKA StochasticStochastic
““You’re not being the ball, Danny” You’re not being the ball, Danny” (Caddy Shack)(Caddy Shack)
• Gravity is great because it tells the ball which way is downhill Gravity is great because it tells the ball which way is downhill at all timesat all times
• We don’t have gravity, so how do we find a successor state?We don’t have gravity, so how do we find a successor state?
– RandomnessRandomness
AKA AKA Monte CarloMonte Carlo
AKA AKA StochasticStochastic
Algorithm OutlineSelect some initial guess of evaluation function parameters: Select some initial guess of evaluation function parameters:
Evaluate evaluation function, Evaluate evaluation function,
Compute a random displacement,Compute a random displacement,
• The Monte Carlo eventThe Monte Carlo event
EvaluateEvaluate
• If If v’ < vv’ < v; set new state,; set new state,
• Else set with Prob(E,T)Else set with Prob(E,T)
– This is the Metropolis stepThis is the Metropolis step
Repeat with updated state and tempRepeat with updated state and temp
Select some initial guess of evaluation function parameters: Select some initial guess of evaluation function parameters:
Evaluate evaluation function, Evaluate evaluation function,
Compute a random displacement,Compute a random displacement,
• The Monte Carlo eventThe Monte Carlo event
EvaluateEvaluate
• If If v’ < vv’ < v; set new state,; set new state,
• Else set with Prob(E,T)Else set with Prob(E,T)
– This is the Metropolis stepThis is the Metropolis step
Repeat with updated state and tempRepeat with updated state and temp
Metropolis Step
We approximate nature’s alignment of molecules by We approximate nature’s alignment of molecules by allowing uphill transitions with some probabilityallowing uphill transitions with some probability•
Prob (in energy state E) ~ Prob (in energy state E) ~
– Boltzmann Probability DistributionBoltzmann Probability Distribution
– Even when T is small, there is still a chance in high energy stateEven when T is small, there is still a chance in high energy state
• Prob (transferring from EProb (transferring from E11 to E to E22) =) =
– Metropolis StepMetropolis Step
– if Eif E22 < E < E11, prob () is greater than 1, prob () is greater than 1
– if Eif E22 > E > E11, we may transfer to higher energy state, we may transfer to higher energy state
The rate at which T is decreased and the amount The rate at which T is decreased and the amount it is decreased is prescribed by anit is decreased is prescribed by an annealing schedule annealing schedule
We approximate nature’s alignment of molecules by We approximate nature’s alignment of molecules by allowing uphill transitions with some probabilityallowing uphill transitions with some probability•
Prob (in energy state E) ~ Prob (in energy state E) ~
– Boltzmann Probability DistributionBoltzmann Probability Distribution
– Even when T is small, there is still a chance in high energy stateEven when T is small, there is still a chance in high energy state
• Prob (transferring from EProb (transferring from E11 to E to E22) =) =
– Metropolis StepMetropolis Step
– if Eif E22 < E < E11, prob () is greater than 1, prob () is greater than 1
– if Eif E22 > E > E11, we may transfer to higher energy state, we may transfer to higher energy state
The rate at which T is decreased and the amount The rate at which T is decreased and the amount it is decreased is prescribed by anit is decreased is prescribed by an annealing schedule annealing schedule
What have we got?Always move downhill if possibleAlways move downhill if possible
Sometimes go uphillSometimes go uphill
• More likely at start when T is highMore likely at start when T is high
Optimality guaranteed with slow annealing scheduleOptimality guaranteed with slow annealing schedule
No need for smooth search spaceNo need for smooth search space
• We do not need to know what nearby successor isWe do not need to know what nearby successor is
Can be discrete search spaceCan be discrete search space
• Traveling salesman problemTraveling salesman problem
Always move downhill if possibleAlways move downhill if possible
Sometimes go uphillSometimes go uphill
• More likely at start when T is highMore likely at start when T is high
Optimality guaranteed with slow annealing scheduleOptimality guaranteed with slow annealing schedule
No need for smooth search spaceNo need for smooth search space
• We do not need to know what nearby successor isWe do not need to know what nearby successor is
Can be discrete search spaceCan be discrete search space
• Traveling salesman problemTraveling salesman problem
More info: Numerical Recipes in C (online) Chapter 10.9
Local Beam Search
Keep more previous states in memoryKeep more previous states in memory• Simulated Annealing just kept one previous state in memorySimulated Annealing just kept one previous state in memory
• This search keeps k states in memoryThis search keeps k states in memory
Generate k initial statesGenerate k initial states
if any state is a goal, terminateif any state is a goal, terminate
else, generate all successors and select best kelse, generate all successors and select best k
repeatrepeat
Keep more previous states in memoryKeep more previous states in memory• Simulated Annealing just kept one previous state in memorySimulated Annealing just kept one previous state in memory
• This search keeps k states in memoryThis search keeps k states in memory
Generate k initial statesGenerate k initial states
if any state is a goal, terminateif any state is a goal, terminate
else, generate all successors and select best kelse, generate all successors and select best k
repeatrepeat
Isn’t this steepest ascent in parallel?
Information is shared between k search pointsInformation is shared between k search points• Each k state generates successorsEach k state generates successors
• Best k successors are selectedBest k successors are selected
• Some search points may contribute none to best successorsSome search points may contribute none to best successors
• One search point may contribute all k successorsOne search point may contribute all k successors
– ““Come over here, the grass is greener” Come over here, the grass is greener” (Russell and Norvig)(Russell and Norvig)
• If executed in parallel, no search points would be terminated If executed in parallel, no search points would be terminated like thislike this
Information is shared between k search pointsInformation is shared between k search points• Each k state generates successorsEach k state generates successors
• Best k successors are selectedBest k successors are selected
• Some search points may contribute none to best successorsSome search points may contribute none to best successors
• One search point may contribute all k successorsOne search point may contribute all k successors
– ““Come over here, the grass is greener” Come over here, the grass is greener” (Russell and Norvig)(Russell and Norvig)
• If executed in parallel, no search points would be terminated If executed in parallel, no search points would be terminated like thislike this
Beam Search
Premature termination of search paths?Premature termination of search paths?
• Stochastic beam searchStochastic beam search
– Instead of choosing best K successorsInstead of choosing best K successors
– Choose k successors at randomChoose k successors at random
Premature termination of search paths?Premature termination of search paths?
• Stochastic beam searchStochastic beam search
– Instead of choosing best K successorsInstead of choosing best K successors
– Choose k successors at randomChoose k successors at random
Genetic Algorithms• Genetic algorithms (GAs) are a technique to solve problems which need
optimization
• GAs are a subclass of Evolutionary Computing
• GAs are based on evolution
• History of GAs
– Evolutionary computing evolved in the 1960’s
– GAs were created by John Holland in the mid-70’s
• Genetic algorithms (GAs) are a technique to solve problems which need optimization
• GAs are a subclass of Evolutionary Computing
• GAs are based on evolution
• History of GAs
– Evolutionary computing evolved in the 1960’s
– GAs were created by John Holland in the mid-70’s
Genetic Programming
When applied to pieces of executable programs, the approaches are classified as genetic programming (GP)
GP operates at a higher level of abstraction than GA
When applied to pieces of executable programs, the approaches are classified as genetic programming (GP)
GP operates at a higher level of abstraction than GA
Components of a GA
A problem to solve, and ...A problem to solve, and ...
• Encoding technique Encoding technique ((gene, chromosomegene, chromosome))
• Initialization procedure Initialization procedure (creation)(creation)
• Evaluation function Evaluation function (environment)(environment)
• Selection of parents Selection of parents (reproduction)(reproduction)
• Genetic operators Genetic operators (mutation, recombination)(mutation, recombination)
• Parameter settings Parameter settings (practice and art)(practice and art)
A “Population”
http://ilab.usc.edu/classes/2003cs460/notes/session26.ppt
Ranking by Fitness:
Mate Selection:
Fittest are copied and replace less-fit
Mate Selection Roulette:
11%
38%
7%
16%0%
3%
25%
Increasing the likelihood but not guaranteeing the fittest reproduction
Crossover:
Exchanging information through some part of information (representation)
Exploit Goodness
Mutation:
Random change of binary digits from 0 to 1 and vice versa (to avoid local minima)
Explore unknown
Best Design
The GA Cycle
The simple GA
Shows many shortcomings, e.g.Shows many shortcomings, e.g.
• Representation is too restrictiveRepresentation is too restrictive
• Mutation & crossovers only applicable for bit-string & integer Mutation & crossovers only applicable for bit-string & integer representationsrepresentations
• Selection mechanism sensitive for converging populations Selection mechanism sensitive for converging populations with close fitness valueswith close fitness values
• Very robust but slowVery robust but slow
– Can make simulated annealing seem fastCan make simulated annealing seem fast
• In the limit, optimalIn the limit, optimal
Shows many shortcomings, e.g.Shows many shortcomings, e.g.
• Representation is too restrictiveRepresentation is too restrictive
• Mutation & crossovers only applicable for bit-string & integer Mutation & crossovers only applicable for bit-string & integer representationsrepresentations
• Selection mechanism sensitive for converging populations Selection mechanism sensitive for converging populations with close fitness valueswith close fitness values
• Very robust but slowVery robust but slow
– Can make simulated annealing seem fastCan make simulated annealing seem fast
• In the limit, optimalIn the limit, optimal
A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Algorithms
Alternative Crossover Operators
Performance with 1 Point Crossover depends on the order Performance with 1 Point Crossover depends on the order that variables occur in the representationthat variables occur in the representation
• more likely to keep together genes that are near each othermore likely to keep together genes that are near each other
• Can never keep together genes from opposite ends of stringCan never keep together genes from opposite ends of string
• This is known as This is known as Positional BiasPositional Bias
• Can be exploited if we know about the structure of our Can be exploited if we know about the structure of our problem, but this is not usually the caseproblem, but this is not usually the case
Performance with 1 Point Crossover depends on the order Performance with 1 Point Crossover depends on the order that variables occur in the representationthat variables occur in the representation
• more likely to keep together genes that are near each othermore likely to keep together genes that are near each other
• Can never keep together genes from opposite ends of stringCan never keep together genes from opposite ends of string
• This is known as This is known as Positional BiasPositional Bias
• Can be exploited if we know about the structure of our Can be exploited if we know about the structure of our problem, but this is not usually the caseproblem, but this is not usually the case
n-point crossover• Choose n random crossover pointsChoose n random crossover points
• Split along those pointsSplit along those points
• Glue parts, alternating between parentsGlue parts, alternating between parents
• Generalization of 1 point (still some positional bias)Generalization of 1 point (still some positional bias)
• Choose n random crossover pointsChoose n random crossover points
• Split along those pointsSplit along those points
• Glue parts, alternating between parentsGlue parts, alternating between parents
• Generalization of 1 point (still some positional bias)Generalization of 1 point (still some positional bias)
Uniform crossover• Assign 'heads' to one parent, 'tails' to the otherAssign 'heads' to one parent, 'tails' to the other
• Flip a coin for each gene of the first childFlip a coin for each gene of the first child
• Make an inverse copy of the gene for the second childMake an inverse copy of the gene for the second child
• Inheritance is independent of positionInheritance is independent of position
• Assign 'heads' to one parent, 'tails' to the otherAssign 'heads' to one parent, 'tails' to the other
• Flip a coin for each gene of the first childFlip a coin for each gene of the first child
• Make an inverse copy of the gene for the second childMake an inverse copy of the gene for the second child
• Inheritance is independent of positionInheritance is independent of position
Crossover• Early states are diverseEarly states are diverse
– Crossover explores state broadlyCrossover explores state broadly
• Later stages are more similarLater stages are more similar
– Crossover fine tunes in small regionCrossover fine tunes in small region
• Early states are diverseEarly states are diverse
– Crossover explores state broadlyCrossover explores state broadly
• Later stages are more similarLater stages are more similar
– Crossover fine tunes in small regionCrossover fine tunes in small region } Like simulated annealing
Mutation
Could screw up a good solutionCould screw up a good solution
• Like metropolis step in simulated annealingLike metropolis step in simulated annealing
Could explore untapped part of search spaceCould explore untapped part of search space
Could screw up a good solutionCould screw up a good solution
• Like metropolis step in simulated annealingLike metropolis step in simulated annealing
Could explore untapped part of search spaceCould explore untapped part of search space
Crossover OR mutation?
Decade long debate…Decade long debate…
Answer (at least, rather wide agreement):Answer (at least, rather wide agreement):
• it depends on the problem, but in general, it is good to have bothit depends on the problem, but in general, it is good to have both
– Mutation alone would workMutation alone would work
Decade long debate…Decade long debate…
Answer (at least, rather wide agreement):Answer (at least, rather wide agreement):
• it depends on the problem, but in general, it is good to have bothit depends on the problem, but in general, it is good to have both
– Mutation alone would workMutation alone would work
• There is co-operation AND competition between themThere is co-operation AND competition between them
• Crossover is explorative, it makes a Crossover is explorative, it makes a bigbig jump to an jump to an
area somewhere “in between” two (parent) areasarea somewhere “in between” two (parent) areas
• Mutation is exploitative, it creates random Mutation is exploitative, it creates random smallsmall
diversions, thereby staying near (in the area of ) the diversions, thereby staying near (in the area of ) the
parentparent
• To hit the optimum you often need a ‘lucky’ mutationTo hit the optimum you often need a ‘lucky’ mutation
• There is co-operation AND competition between themThere is co-operation AND competition between them
• Crossover is explorative, it makes a Crossover is explorative, it makes a bigbig jump to an jump to an
area somewhere “in between” two (parent) areasarea somewhere “in between” two (parent) areas
• Mutation is exploitative, it creates random Mutation is exploitative, it creates random smallsmall
diversions, thereby staying near (in the area of ) the diversions, thereby staying near (in the area of ) the
parentparent
• To hit the optimum you often need a ‘lucky’ mutationTo hit the optimum you often need a ‘lucky’ mutation
Crossover OR mutation? (cont’d)
A Simple Example
The Traveling Salesman Problem:The Traveling Salesman Problem:
Find a tour of a given set of cities so that Find a tour of a given set of cities so that
• each city is visited only onceeach city is visited only once
• the total distance traveled is minimizedthe total distance traveled is minimized
From Wendy Williamsweb.umr.edu/~ercal/387/slides/GATutorial.ppt
Representation
Representation is an ordered list of cityRepresentation is an ordered list of city
numbers known as an numbers known as an order-basedorder-based GA. GA.
1) London 3) Dublin 5) Beijing 7) Tokyo1) London 3) Dublin 5) Beijing 7) Tokyo
2) Venice 4) Singapore 6) Phoenix 8) Victoria2) Venice 4) Singapore 6) Phoenix 8) Victoria
CityList1CityList1 (3 5 7 2 1 6 4 8)(3 5 7 2 1 6 4 8)
CityList2CityList2 (2 5 7 6 8 1 3 4)(2 5 7 6 8 1 3 4)
CrossoverCrossover combines inversion andCrossover combines inversion and
recombination:recombination:
* ** *
Parent1Parent1 (3 5 7 2 1 6 4 8) (3 5 7 2 1 6 4 8)
Parent2Parent2 (2 5 7 6 8 1 3 4) (2 5 7 6 8 1 3 4)
ChildChild (5 8 7 2 1 6 3 4) (5 8 7 2 1 6 3 4)
This operator is called the This operator is called the Order1 Order1 crossover.crossover.
Mutation involves reordering of the list:Mutation involves reordering of the list:
** **
Before: (5 8 7 2 1 6 3 4)Before: (5 8 7 2 1 6 3 4)
After: (5 8 6 2 1 7 3 4)After: (5 8 6 2 1 7 3 4)
Mutation
TSP Example: 30 Cities
Distance = 941
Distance = 800
Distance = 652
Distance = 420
Overview of performance
Cycle crossover example
Step 1: identify cyclesStep 1: identify cycles
Step 2: copy alternate cycles into offspringStep 2: copy alternate cycles into offspring
Step 1: identify cyclesStep 1: identify cycles
Step 2: copy alternate cycles into offspringStep 2: copy alternate cycles into offspring
Issues for GA PractitionersChoosing basic implementation issues:Choosing basic implementation issues:
• representationrepresentation
• population size, mutation rate, ...population size, mutation rate, ...
• selection, deletion policiesselection, deletion policies
• crossover, mutation operatorscrossover, mutation operators
Termination CriteriaTermination Criteria
Performance, scalabilityPerformance, scalability
Solution is only as good as the evaluation function (often Solution is only as good as the evaluation function (often hardest part)hardest part)
Benefits of Genetic Algorithms
• Concept is easy to understandConcept is easy to understand
• Modular, separate from applicationModular, separate from application
• Supports multi-objective optimizationSupports multi-objective optimization
• Good for “noisy” environmentsGood for “noisy” environments
• Always an answer; answer gets better with timeAlways an answer; answer gets better with time
• Inherently parallel; easily distributedInherently parallel; easily distributed
Benefits of Genetic Algorithms
• Many ways to speed up and improve a GA-based application Many ways to speed up and improve a GA-based application as knowledge about problem domain is gainedas knowledge about problem domain is gained
• Easy to exploit previous or alternate solutionsEasy to exploit previous or alternate solutions
• Flexible building blocks for hybrid applicationsFlexible building blocks for hybrid applications
• Substantial history and range of useSubstantial history and range of use