CS 416 Artificial Intelligence Lecture 7 Optimization Optimization

CS 416Artificial Intelligence

Lecture 7Lecture 7

OptimizationOptimization

Lecture 7Lecture 7

OptimizationOptimization

Tomorrow’s Demos

TA will be at Thornton StacksTA will be at Thornton Stacks

Get your stuff set up to demoGet your stuff set up to demo

Find Chris and let him know you’re readyFind Chris and let him know you’re ready

TA will be at Thornton StacksTA will be at Thornton Stacks

Get your stuff set up to demoGet your stuff set up to demo

Find Chris and let him know you’re readyFind Chris and let him know you’re ready

Hill Climbing

Continuous spacesContinuous spaces

• Derivative tells you “downhill” directionDerivative tells you “downhill” direction

Continuous spacesContinuous spaces

• Derivative tells you “downhill” directionDerivative tells you “downhill” direction

Hillclimbing

Multidimensional continuous spacesMultidimensional continuous spaces

• Derivative computation computes a hyperplaneDerivative computation computes a hyperplane

Multidimensional continuous spacesMultidimensional continuous spaces

• Derivative computation computes a hyperplaneDerivative computation computes a hyperplane

Simulated Annealing

A term borrowed from metalworkingA term borrowed from metalworking• We want metal molecules to find a stable location relative to We want metal molecules to find a stable location relative to

neighborsneighbors

• heating causes metal molecules to jump around and to take on heating causes metal molecules to jump around and to take on undesirable (high energy) locationsundesirable (high energy) locations

• during cooling, molecules reduce their movement and settle during cooling, molecules reduce their movement and settle into a more stable (low energy) position into a more stable (low energy) position

• annealing is process of heating metal and letting it cool slowly annealing is process of heating metal and letting it cool slowly to lock in the stable locations of the moleculesto lock in the stable locations of the molecules

A term borrowed from metalworkingA term borrowed from metalworking• We want metal molecules to find a stable location relative to We want metal molecules to find a stable location relative to

neighborsneighbors

• heating causes metal molecules to jump around and to take on heating causes metal molecules to jump around and to take on undesirable (high energy) locationsundesirable (high energy) locations

• during cooling, molecules reduce their movement and settle during cooling, molecules reduce their movement and settle into a more stable (low energy) position into a more stable (low energy) position

• annealing is process of heating metal and letting it cool slowly annealing is process of heating metal and letting it cool slowly to lock in the stable locations of the moleculesto lock in the stable locations of the molecules

Simulated Annealing

““Be the Ball”Be the Ball”

• You have a wrinkled sheet of metalYou have a wrinkled sheet of metal

• Place a BB on the sheet and what happens?Place a BB on the sheet and what happens?

– BB rolls downhillBB rolls downhill

– BB stops at bottom of hill (local or global min?)BB stops at bottom of hill (local or global min?)

– BB momentum may carry it out of hill into another (local or global)BB momentum may carry it out of hill into another (local or global)

• By shaking metal sheet, your are adding energy (heat)By shaking metal sheet, your are adding energy (heat)

• How hard do you shake?How hard do you shake?

““Be the Ball”Be the Ball”

• You have a wrinkled sheet of metalYou have a wrinkled sheet of metal

• Place a BB on the sheet and what happens?Place a BB on the sheet and what happens?

– BB rolls downhillBB rolls downhill

– BB stops at bottom of hill (local or global min?)BB stops at bottom of hill (local or global min?)

– BB momentum may carry it out of hill into another (local or global)BB momentum may carry it out of hill into another (local or global)

• By shaking metal sheet, your are adding energy (heat)By shaking metal sheet, your are adding energy (heat)

• How hard do you shake?How hard do you shake?

Our Simulated Annealing Algorithm

““You’re not being the ball, Danny” You’re not being the ball, Danny” (Caddy Shack)(Caddy Shack)

• Gravity is great because it tells the ball which way is downhill Gravity is great because it tells the ball which way is downhill at all timesat all times

• We don’t have gravity, so how do we find a successor state?We don’t have gravity, so how do we find a successor state?

– RandomnessRandomness

AKA AKA Monte CarloMonte Carlo

AKA AKA StochasticStochastic

““You’re not being the ball, Danny” You’re not being the ball, Danny” (Caddy Shack)(Caddy Shack)

• Gravity is great because it tells the ball which way is downhill Gravity is great because it tells the ball which way is downhill at all timesat all times

• We don’t have gravity, so how do we find a successor state?We don’t have gravity, so how do we find a successor state?

– RandomnessRandomness

AKA AKA Monte CarloMonte Carlo

AKA AKA StochasticStochastic

Algorithm OutlineSelect some initial guess of evaluation function parameters: Select some initial guess of evaluation function parameters:

Evaluate evaluation function, Evaluate evaluation function,

Compute a random displacement,Compute a random displacement,

• The Monte Carlo eventThe Monte Carlo event

EvaluateEvaluate

• If If v’ < vv’ < v; set new state,; set new state,

• Else set with Prob(E,T)Else set with Prob(E,T)

– This is the Metropolis stepThis is the Metropolis step

Repeat with updated state and tempRepeat with updated state and temp

Select some initial guess of evaluation function parameters: Select some initial guess of evaluation function parameters:

Evaluate evaluation function, Evaluate evaluation function,

Compute a random displacement,Compute a random displacement,

• The Monte Carlo eventThe Monte Carlo event

EvaluateEvaluate

• If If v’ < vv’ < v; set new state,; set new state,

• Else set with Prob(E,T)Else set with Prob(E,T)

– This is the Metropolis stepThis is the Metropolis step

Repeat with updated state and tempRepeat with updated state and temp

Metropolis Step

We approximate nature’s alignment of molecules by We approximate nature’s alignment of molecules by allowing uphill transitions with some probabilityallowing uphill transitions with some probability•

Prob (in energy state E) ~ Prob (in energy state E) ~

– Boltzmann Probability DistributionBoltzmann Probability Distribution

– Even when T is small, there is still a chance in high energy stateEven when T is small, there is still a chance in high energy state

• Prob (transferring from EProb (transferring from E11 to E to E22) =) =

– Metropolis StepMetropolis Step

– if Eif E22 < E < E11, prob () is greater than 1, prob () is greater than 1

– if Eif E22 > E > E11, we may transfer to higher energy state, we may transfer to higher energy state

The rate at which T is decreased and the amount The rate at which T is decreased and the amount it is decreased is prescribed by anit is decreased is prescribed by an annealing schedule annealing schedule

We approximate nature’s alignment of molecules by We approximate nature’s alignment of molecules by allowing uphill transitions with some probabilityallowing uphill transitions with some probability•

Prob (in energy state E) ~ Prob (in energy state E) ~

– Boltzmann Probability DistributionBoltzmann Probability Distribution

– Even when T is small, there is still a chance in high energy stateEven when T is small, there is still a chance in high energy state

• Prob (transferring from EProb (transferring from E11 to E to E22) =) =

– Metropolis StepMetropolis Step

– if Eif E22 < E < E11, prob () is greater than 1, prob () is greater than 1

– if Eif E22 > E > E11, we may transfer to higher energy state, we may transfer to higher energy state

The rate at which T is decreased and the amount The rate at which T is decreased and the amount it is decreased is prescribed by anit is decreased is prescribed by an annealing schedule annealing schedule

What have we got?Always move downhill if possibleAlways move downhill if possible

Sometimes go uphillSometimes go uphill

• More likely at start when T is highMore likely at start when T is high

Optimality guaranteed with slow annealing scheduleOptimality guaranteed with slow annealing schedule

No need for smooth search spaceNo need for smooth search space

• We do not need to know what nearby successor isWe do not need to know what nearby successor is

Can be discrete search spaceCan be discrete search space

• Traveling salesman problemTraveling salesman problem

Always move downhill if possibleAlways move downhill if possible

Sometimes go uphillSometimes go uphill

• More likely at start when T is highMore likely at start when T is high

Optimality guaranteed with slow annealing scheduleOptimality guaranteed with slow annealing schedule

No need for smooth search spaceNo need for smooth search space

• We do not need to know what nearby successor isWe do not need to know what nearby successor is

Can be discrete search spaceCan be discrete search space

• Traveling salesman problemTraveling salesman problem

More info: Numerical Recipes in C (online) Chapter 10.9

Local Beam Search

Keep more previous states in memoryKeep more previous states in memory• Simulated Annealing just kept one previous state in memorySimulated Annealing just kept one previous state in memory

• This search keeps k states in memoryThis search keeps k states in memory

Generate k initial statesGenerate k initial states

if any state is a goal, terminateif any state is a goal, terminate

else, generate all successors and select best kelse, generate all successors and select best k

repeatrepeat

Keep more previous states in memoryKeep more previous states in memory• Simulated Annealing just kept one previous state in memorySimulated Annealing just kept one previous state in memory

• This search keeps k states in memoryThis search keeps k states in memory

Generate k initial statesGenerate k initial states

if any state is a goal, terminateif any state is a goal, terminate

else, generate all successors and select best kelse, generate all successors and select best k

repeatrepeat

Isn’t this steepest ascent in parallel?

Information is shared between k search pointsInformation is shared between k search points• Each k state generates successorsEach k state generates successors

• Best k successors are selectedBest k successors are selected

• Some search points may contribute none to best successorsSome search points may contribute none to best successors

• One search point may contribute all k successorsOne search point may contribute all k successors

– ““Come over here, the grass is greener” Come over here, the grass is greener” (Russell and Norvig)(Russell and Norvig)

• If executed in parallel, no search points would be terminated If executed in parallel, no search points would be terminated like thislike this

Information is shared between k search pointsInformation is shared between k search points• Each k state generates successorsEach k state generates successors

• Best k successors are selectedBest k successors are selected

• Some search points may contribute none to best successorsSome search points may contribute none to best successors

• One search point may contribute all k successorsOne search point may contribute all k successors

– ““Come over here, the grass is greener” Come over here, the grass is greener” (Russell and Norvig)(Russell and Norvig)

• If executed in parallel, no search points would be terminated If executed in parallel, no search points would be terminated like thislike this

Beam Search

Premature termination of search paths?Premature termination of search paths?

• Stochastic beam searchStochastic beam search

– Instead of choosing best K successorsInstead of choosing best K successors

– Choose k successors at randomChoose k successors at random

Premature termination of search paths?Premature termination of search paths?

• Stochastic beam searchStochastic beam search

– Instead of choosing best K successorsInstead of choosing best K successors

– Choose k successors at randomChoose k successors at random

Genetic Algorithms• Genetic algorithms (GAs) are a technique to solve problems which need

optimization

• GAs are a subclass of Evolutionary Computing

• GAs are based on evolution

• History of GAs

– Evolutionary computing evolved in the 1960’s

– GAs were created by John Holland in the mid-70’s

• Genetic algorithms (GAs) are a technique to solve problems which need optimization

• GAs are a subclass of Evolutionary Computing

• GAs are based on evolution

• History of GAs

– Evolutionary computing evolved in the 1960’s

– GAs were created by John Holland in the mid-70’s

Genetic Programming

When applied to pieces of executable programs, the approaches are classified as genetic programming (GP)

GP operates at a higher level of abstraction than GA

When applied to pieces of executable programs, the approaches are classified as genetic programming (GP)

GP operates at a higher level of abstraction than GA

Components of a GA

A problem to solve, and ...A problem to solve, and ...

• Encoding technique Encoding technique ((gene, chromosomegene, chromosome))

• Initialization procedure Initialization procedure (creation)(creation)

• Evaluation function Evaluation function (environment)(environment)

• Selection of parents Selection of parents (reproduction)(reproduction)

• Genetic operators Genetic operators (mutation, recombination)(mutation, recombination)

• Parameter settings Parameter settings (practice and art)(practice and art)

A “Population”

http://ilab.usc.edu/classes/2003cs460/notes/session26.ppt

Ranking by Fitness:

Mate Selection:

Fittest are copied and replace less-fit

Mate Selection Roulette:

11%

38%

7%

16%0%

3%

25%

Increasing the likelihood but not guaranteeing the fittest reproduction

Crossover:

Exchanging information through some part of information (representation)

Exploit Goodness

Mutation:

Random change of binary digits from 0 to 1 and vice versa (to avoid local minima)

Explore unknown

Best Design

The GA Cycle

The simple GA

Shows many shortcomings, e.g.Shows many shortcomings, e.g.

• Representation is too restrictiveRepresentation is too restrictive

• Mutation & crossovers only applicable for bit-string & integer Mutation & crossovers only applicable for bit-string & integer representationsrepresentations

• Selection mechanism sensitive for converging populations Selection mechanism sensitive for converging populations with close fitness valueswith close fitness values

• Very robust but slowVery robust but slow

– Can make simulated annealing seem fastCan make simulated annealing seem fast

• In the limit, optimalIn the limit, optimal

Shows many shortcomings, e.g.Shows many shortcomings, e.g.

• Representation is too restrictiveRepresentation is too restrictive

• Mutation & crossovers only applicable for bit-string & integer Mutation & crossovers only applicable for bit-string & integer representationsrepresentations

• Selection mechanism sensitive for converging populations Selection mechanism sensitive for converging populations with close fitness valueswith close fitness values

• Very robust but slowVery robust but slow

– Can make simulated annealing seem fastCan make simulated annealing seem fast

• In the limit, optimalIn the limit, optimal

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Algorithms

Alternative Crossover Operators

Performance with 1 Point Crossover depends on the order Performance with 1 Point Crossover depends on the order that variables occur in the representationthat variables occur in the representation

• more likely to keep together genes that are near each othermore likely to keep together genes that are near each other

• Can never keep together genes from opposite ends of stringCan never keep together genes from opposite ends of string

• This is known as This is known as Positional BiasPositional Bias

• Can be exploited if we know about the structure of our Can be exploited if we know about the structure of our problem, but this is not usually the caseproblem, but this is not usually the case

Performance with 1 Point Crossover depends on the order Performance with 1 Point Crossover depends on the order that variables occur in the representationthat variables occur in the representation

• more likely to keep together genes that are near each othermore likely to keep together genes that are near each other

• Can never keep together genes from opposite ends of stringCan never keep together genes from opposite ends of string

• This is known as This is known as Positional BiasPositional Bias

• Can be exploited if we know about the structure of our Can be exploited if we know about the structure of our problem, but this is not usually the caseproblem, but this is not usually the case

n-point crossover• Choose n random crossover pointsChoose n random crossover points

• Split along those pointsSplit along those points

• Glue parts, alternating between parentsGlue parts, alternating between parents

• Generalization of 1 point (still some positional bias)Generalization of 1 point (still some positional bias)

• Choose n random crossover pointsChoose n random crossover points

• Split along those pointsSplit along those points

• Glue parts, alternating between parentsGlue parts, alternating between parents

• Generalization of 1 point (still some positional bias)Generalization of 1 point (still some positional bias)

Uniform crossover• Assign 'heads' to one parent, 'tails' to the otherAssign 'heads' to one parent, 'tails' to the other

• Flip a coin for each gene of the first childFlip a coin for each gene of the first child

• Make an inverse copy of the gene for the second childMake an inverse copy of the gene for the second child

• Inheritance is independent of positionInheritance is independent of position

• Assign 'heads' to one parent, 'tails' to the otherAssign 'heads' to one parent, 'tails' to the other

• Flip a coin for each gene of the first childFlip a coin for each gene of the first child

• Make an inverse copy of the gene for the second childMake an inverse copy of the gene for the second child

• Inheritance is independent of positionInheritance is independent of position

Crossover• Early states are diverseEarly states are diverse

– Crossover explores state broadlyCrossover explores state broadly

• Later stages are more similarLater stages are more similar

– Crossover fine tunes in small regionCrossover fine tunes in small region

• Early states are diverseEarly states are diverse

– Crossover explores state broadlyCrossover explores state broadly

• Later stages are more similarLater stages are more similar

– Crossover fine tunes in small regionCrossover fine tunes in small region } Like simulated annealing

Mutation

Could screw up a good solutionCould screw up a good solution

• Like metropolis step in simulated annealingLike metropolis step in simulated annealing

Could explore untapped part of search spaceCould explore untapped part of search space

Could screw up a good solutionCould screw up a good solution

• Like metropolis step in simulated annealingLike metropolis step in simulated annealing

Could explore untapped part of search spaceCould explore untapped part of search space

Crossover OR mutation?

Decade long debate…Decade long debate…

Answer (at least, rather wide agreement):Answer (at least, rather wide agreement):

• it depends on the problem, but in general, it is good to have bothit depends on the problem, but in general, it is good to have both

– Mutation alone would workMutation alone would work

Decade long debate…Decade long debate…

Answer (at least, rather wide agreement):Answer (at least, rather wide agreement):

• it depends on the problem, but in general, it is good to have bothit depends on the problem, but in general, it is good to have both

– Mutation alone would workMutation alone would work

• There is co-operation AND competition between themThere is co-operation AND competition between them

• Crossover is explorative, it makes a Crossover is explorative, it makes a bigbig jump to an jump to an

area somewhere “in between” two (parent) areasarea somewhere “in between” two (parent) areas

• Mutation is exploitative, it creates random Mutation is exploitative, it creates random smallsmall

diversions, thereby staying near (in the area of ) the diversions, thereby staying near (in the area of ) the

parentparent

• To hit the optimum you often need a ‘lucky’ mutationTo hit the optimum you often need a ‘lucky’ mutation

• There is co-operation AND competition between themThere is co-operation AND competition between them

• Crossover is explorative, it makes a Crossover is explorative, it makes a bigbig jump to an jump to an

area somewhere “in between” two (parent) areasarea somewhere “in between” two (parent) areas

• Mutation is exploitative, it creates random Mutation is exploitative, it creates random smallsmall

diversions, thereby staying near (in the area of ) the diversions, thereby staying near (in the area of ) the

parentparent

• To hit the optimum you often need a ‘lucky’ mutationTo hit the optimum you often need a ‘lucky’ mutation

Crossover OR mutation? (cont’d)

A Simple Example

The Traveling Salesman Problem:The Traveling Salesman Problem:

Find a tour of a given set of cities so that Find a tour of a given set of cities so that

• each city is visited only onceeach city is visited only once

• the total distance traveled is minimizedthe total distance traveled is minimized

From Wendy Williamsweb.umr.edu/~ercal/387/slides/GATutorial.ppt

Representation

Representation is an ordered list of cityRepresentation is an ordered list of city

numbers known as an numbers known as an order-basedorder-based GA. GA.

1) London 3) Dublin 5) Beijing 7) Tokyo1) London 3) Dublin 5) Beijing 7) Tokyo

2) Venice 4) Singapore 6) Phoenix 8) Victoria2) Venice 4) Singapore 6) Phoenix 8) Victoria

CityList1CityList1 (3 5 7 2 1 6 4 8)(3 5 7 2 1 6 4 8)

CityList2CityList2 (2 5 7 6 8 1 3 4)(2 5 7 6 8 1 3 4)

CrossoverCrossover combines inversion andCrossover combines inversion and

recombination:recombination:

* ** *

Parent1Parent1 (3 5 7 2 1 6 4 8) (3 5 7 2 1 6 4 8)

Parent2Parent2 (2 5 7 6 8 1 3 4) (2 5 7 6 8 1 3 4)

ChildChild (5 8 7 2 1 6 3 4) (5 8 7 2 1 6 3 4)

This operator is called the This operator is called the Order1 Order1 crossover.crossover.

Mutation involves reordering of the list:Mutation involves reordering of the list:

** **

Before: (5 8 7 2 1 6 3 4)Before: (5 8 7 2 1 6 3 4)

After: (5 8 6 2 1 7 3 4)After: (5 8 6 2 1 7 3 4)

Mutation

TSP Example: 30 Cities

Distance = 941

Distance = 800

Distance = 652

Distance = 420

Overview of performance

Cycle crossover example

Step 1: identify cyclesStep 1: identify cycles

Step 2: copy alternate cycles into offspringStep 2: copy alternate cycles into offspring

Step 1: identify cyclesStep 1: identify cycles

Step 2: copy alternate cycles into offspringStep 2: copy alternate cycles into offspring

Issues for GA PractitionersChoosing basic implementation issues:Choosing basic implementation issues:

• representationrepresentation

• population size, mutation rate, ...population size, mutation rate, ...

• selection, deletion policiesselection, deletion policies

• crossover, mutation operatorscrossover, mutation operators

Termination CriteriaTermination Criteria

Performance, scalabilityPerformance, scalability

Solution is only as good as the evaluation function (often Solution is only as good as the evaluation function (often hardest part)hardest part)

Benefits of Genetic Algorithms

• Concept is easy to understandConcept is easy to understand

• Modular, separate from applicationModular, separate from application

• Supports multi-objective optimizationSupports multi-objective optimization

• Good for “noisy” environmentsGood for “noisy” environments

• Always an answer; answer gets better with timeAlways an answer; answer gets better with time

• Inherently parallel; easily distributedInherently parallel; easily distributed

Benefits of Genetic Algorithms

• Many ways to speed up and improve a GA-based application Many ways to speed up and improve a GA-based application as knowledge about problem domain is gainedas knowledge about problem domain is gained

• Easy to exploit previous or alternate solutionsEasy to exploit previous or alternate solutions

• Flexible building blocks for hybrid applicationsFlexible building blocks for hybrid applications

• Substantial history and range of useSubstantial history and range of use

Documents

CS 416 Artificial Intelligence Lecture 7 Optimization Optimization