56
Simulated Annealing Simulated Annealing Biostatistics 615/815 Lecture 18 Lecture 18

615.18 -- Simulated Annealing - umich.edu

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 615.18 -- Simulated Annealing - umich.edu

Simulated AnnealingSimulated Annealing

Biostatistics 615/815Lecture 18Lecture 18

Page 2: 615.18 -- Simulated Annealing - umich.edu

So far …

“Greedy” optimization methods• Can get trapped at local minima• Outcome might depend on starting point

Examples:• Golden Search• Nelder-Mead Simplex Optimization• E-M Algorithm

Page 3: 615.18 -- Simulated Annealing - umich.edu

Today …

Simulated Annealing

Markov-Chain Monte-Carlo method

Designed to search for global minimum among many local minimaamong many local minima

Page 4: 615.18 -- Simulated Annealing - umich.edu

The Problem

Most minimization strategies find the nearest local minimum

Standard strategyStandard strategy• Generate trial point based on current estimates• Evaluate function at proposed locationEvaluate function at proposed location• Accept new value if it improves solution

Page 5: 615.18 -- Simulated Annealing - umich.edu

The Solution

We need a strategy to find other minima

This means, we must sometimes select new points that do not improve solutionnew points that do not improve solution

How?How?

Page 6: 615.18 -- Simulated Annealing - umich.edu

Annealing

One manner in which crystals are formed

Gradual cooling of liquid …• At high temperatures, molecules move freely• At low temperatures, molecules are "stuck"

If li i lIf cooling is slow• Low energy, organized crystal lattice formed

Page 7: 615.18 -- Simulated Annealing - umich.edu

Simulated Annealing

Analogy with thermodynamics

Incorporate a temperature parameter into the minimization procedure

At high temperatures, explore parameter space

At lower temperatures, restrict exploration

Page 8: 615.18 -- Simulated Annealing - umich.edu

Markov Chain

The Markovian property1 1 0 0 1 1Pr( | , , ) Pr( | )n n n n n n n nZ i Z i Z i Z i Z i− − − −= = = = = =K

Transition probability Pr( | )Q Z j Z i= = =

n-step transition

1Pr( | )ij n nQ Z j Z i−= = =

1 1

1 1

( )0Pr( | )

n

n

nij n ii i j

i iQ Z j Z i p p

= = = =∑ ∑L L

(all possible n-step paths i → j)

Page 9: 615.18 -- Simulated Annealing - umich.edu

Markov Chain (In Practice)

Start with some state (Zn=i)• A set of mixture parameters

Propose a change (Zn+1=j)• Edit i t t i• Edit mixture parameters in some way

Decide whether to accept change (Q )Decide whether to accept change (Qij)• Decision is based on relative probabilities of

two outcomestwo outcomes

Page 10: 615.18 -- Simulated Annealing - umich.edu

Simulated Annealing Strategy

Consider decreasing series of temperatures

For each temperature, iterate these steps:• Propose an update and evaluate function• A t d t th t i l ti• Accept updates that improve solution• Accept some updates that don't improve solution

• Acceptance probability depends on “temperature” parameterp p y p p p

If cooling is sufficiently slow, the global minimum ill b h dwill be reached

Page 11: 615.18 -- Simulated Annealing - umich.edu

Example Application

The traveling salesman problem• Salesman must visit every city in a set• Gi di b i f i i• Given distances between pairs of cities• Find the shortest route through the set

No practical deterministic algorithms for finding optimal solution are knownfinding optimal solution are known…• … simulated annealing and other stochastic

methods can do quite well

Page 12: 615.18 -- Simulated Annealing - umich.edu

Update Scheme

A good scheme should be able to:• Connect any two possible paths• Propose improvements to good solutionsPropose improvements to good solutions

Some possible update schemes:• Swap a pair of cities in current path• Invert a segment in current path

What do you think of these?

Page 13: 615.18 -- Simulated Annealing - umich.edu

HowHow simulated annealing

proceedsproceeds …

Page 14: 615.18 -- Simulated Annealing - umich.edu

A little more detail

Metropolis (1953), Hastings (1970)• Define a set of conditions that, if met, ensure

the random walk will sample from probability distribution at equilibrium• In theory• In theory

• Recommendations apply to how changes are eco e dat o s app y to o c a ges a eproposed and accepted

Page 15: 615.18 -- Simulated Annealing - umich.edu

Accepting an Update

The Metropolis criterion

Change from E0 to E with probability

⎞⎛ ⎫⎧⎟⎟⎠

⎞⎜⎜⎝

⎭⎬⎫

⎩⎨⎧ −−

TEE )(exp,1min 0

Given sufficient time, leads to equilibrium state

Page 16: 615.18 -- Simulated Annealing - umich.edu

Evaluating Proposals inEvaluating Proposals inSimulated Annealingint accept_proposal(double current, double proposal,

double temperature){double prob;

if (proposal < current)return 1;

if ( 0 0)if (temperature == 0.0)return 0;

prob = exp(-(proposal - current) / temperature);

return rand_prob() < prob;}

Page 17: 615.18 -- Simulated Annealing - umich.edu

Key Requirements

Irreducibility: all states must communicate• Some integer n exists where Q(n)

ij > 0

Aperiodicity: to further ensureSome integer n exists where Q(n)

ij > 0 for all i,j

These two conditions guarantee existence of a unique equilibrium distribution

Page 18: 615.18 -- Simulated Annealing - umich.edu

Equilibrium Distribution

Limiting matrix of Q(n) should have identical rows π• Starting point does not affect results

Equilibrium distribution π satisfies

( )π π⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟% %( )1lim limn n

n nP P P P

π π

+

→∞ →∞

⎜ ⎟ ⎜ ⎟= = =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

M M

% %

Page 19: 615.18 -- Simulated Annealing - umich.edu

Equilibrium Distribution Equilibrium Distribution (Simulated Annealing)

Probability of state with energy k is

⎟⎠⎞

⎜⎝⎛−∝=

TkkEP exp)(

At low T, probability is concentrated in low

⎠⎝ T

energy states

Page 20: 615.18 -- Simulated Annealing - umich.edu

Simulated Annealing Recipe

11. Select starting temperature and initial parameter values

2. Randomly select a new point in the i hb h d f th i i lneighborhood of the original

33. Compare the two points using the Metropolis criterion

Page 21: 615.18 -- Simulated Annealing - umich.edu

Simulated Annealing Recipe

4. Repeat steps 2 and 3 until system reaches equilibrium state…

In practice, repeat the process N times for large N

5. Decrease temperature and repeat the above steps, stop when system reaches frozen state

Page 22: 615.18 -- Simulated Annealing - umich.edu

Practical Issues

The maximum temperaturep

Scheme for decreasing temperatureScheme for decreasing temperature

St t f i d tStrategy for proposing updates

Page 23: 615.18 -- Simulated Annealing - umich.edu

Selecting a Nearby Point

Suggestion of Brooks and Morgan (1995) works well for our problem• Select a component to update• Sample from within plausible range

Many other alternatives• The authors of Numerical Recipes use a variant

of the Nelder-Mead method

Page 24: 615.18 -- Simulated Annealing - umich.edu

C Code:C Code:Simple Sampling Functions// Assume that function Random() generates // random numbers between 0.0 and 1.0// E l f l t 14 it bl// Examples from lecture 14 are suitable

// Random numbers within arbitrary range// Random numbers within arbitrary rangedouble randu(double min, double max)

{{return Random() * (max - min) + min;}

Page 25: 615.18 -- Simulated Annealing - umich.edu

Updating Means and Variances

Select component to update at random

Sample a new mean (or variance) within plausible range for parameterplausible range for parameter

Decide whether to accept proposalDecide whether to accept proposal

Page 26: 615.18 -- Simulated Annealing - umich.edu

C Code:C Code:Updating Meansdouble sa_means(int dim,

double * probs, double * means, double * sigmas,double llk, double temperature, double min, double max)

{int c = Random() * dim;double proposal, old = means[c];

means[c] = randu(min, max);i ( i i )proposal = -mixLLK(n, data, dim, probs, means, sigmas);

if (accept_proposal(llk, proposal, temperature))return proposal;

means[c] = old;return llk;}

Page 27: 615.18 -- Simulated Annealing - umich.edu

C Code:C Code:Updating Standard Deviationdouble sa_sigmas(int dim,

double * probs, double * means, double * sigmas,double llk, double temperature, double min, double max)

{int c = Random() * dim;double proposal, old = sigmas[c];

sigmas[c] = randu(min, max);i ( i i )proposal = -mixLLK(n, data, dim, probs, means, sigmas);

if (accept_proposal(llk, proposal, temperature))return proposal;

sigmas[c] = old;return llk;}

Page 28: 615.18 -- Simulated Annealing - umich.edu

Updating Mixture Proportions

Mixture proportions must sum to 1.0

When updating one proportion, must take others into account

Select a component at random• Increase or decrease probability by ~20%• Rescale all proportions so they sum to 1.0

Page 29: 615.18 -- Simulated Annealing - umich.edu

C Code:C Code:Vector Utility Functionsdouble * duplicate_vector(double * v, int dim)

{int i;double * dup = alloc_vector(dim);

for (i = 0; i < dim; i++)dup[i] = v[i];

return dup;}

void copy_vector(double * dest, double * source, int dim){{for (i = 0; i < dim; i++)

dest[i] = source[i];}

Page 30: 615.18 -- Simulated Annealing - umich.edu

C Code:C Code:Changing Mixture Proportionsdouble sa_probs(int dim, double * probs, double * means,

double * sigmas, double llk, double temperature){int i, c = Random() * dim;double proposal, * save_probs = duplicate_vector(probs, dim);

probs[c] *= randu(0.8, 1.25);adjust_probs(probs, dim);

proposal = -mixLLK(n, data, dim, probs, means, sigmas);if (accept_proposal(llk, proposal, temperature))

llk = proposal;lelse

copy_vector(probs, save_probs, dim);

free_vector(save_probs, dim);t llkreturn llk;

}

Page 31: 615.18 -- Simulated Annealing - umich.edu

C Code:C Code:Adjusting Probabilities

Th f ll i f i b bili i l 1 0The following function ensures probabilities always sum to 1.0

void adjust_probs(double * probs, int dim){{int i;double sum = 0.0;

for (i = 0; i < dim; i++)sum += probs[i];

for (i = 0; i < dim; i++)probs[i] /= sum;

}

Page 32: 615.18 -- Simulated Annealing - umich.edu

Simulated Annealing Procedure

Cycle through temperatures

At each temperature, evaluate proposed changes to mean, variance and mixingchanges to mean, variance and mixing proportions

Page 33: 615.18 -- Simulated Annealing - umich.edu

C Code: Simulated AnnealingC Code: Simulated Annealingdouble sa(int k, double * probs, double * means, double * sigmas, double eps)

{double llk = -mixLLK(n, data, k, probs, means, sigmas);double temperature = MAX TEMP; int choice, N;double temperature MAX_TEMP; int choice, N;

double lo = min(data, n), hi = max(data, n);double stdev = stdev(data, n), sdhi = 2.0 * stdev, sdlo = 0.1 * stdev;

while (temperature > eps) {while (temperature > eps) {for (N = 0; N < 1000; N++)

switch (choice = Random() * 3){case 0 :

llk = sa probs(k probs means sigmas llk temperature);llk = sa_probs(k, probs, means, sigmas, llk, temperature);break;

case 1 : llk = sa_means(k, probs, means, sigmas, llk, temperature, lo, hi);break;

case 2 :case 2 :llk = sa_sigmas(k, probs, means, sigmas, llk, temperature, sdlo, sdhi);

}temperature *= 0.90; }

return llk;}}

Page 34: 615.18 -- Simulated Annealing - umich.edu

Example ApplicationExample ApplicationOld Faithful Eruptions (n = 272)

Old Faithful Eruptions

ncy

1520

Freq

ue

510

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

05

Duration (mins)

Page 35: 615.18 -- Simulated Annealing - umich.edu

E-M Algorithm:E M Algorithm:A Mixture of Three Normals

Fi 8Fit 8 parameters• 2 proportions, 3 means, 3 variances

R i d b t 150 l tiRequired about ~150 evaluations• Found log-likelihood of ~267.89 in 42/50 runs• Found log-likelihood of ~263.91 in 7/50 runs

The best solutions …• Components contributing .160, 0.195 and 0.644• Component means are 1 856 2 182 and 4 289Component means are 1.856, 2.182 and 4.289• Variances are 0.00766, 0.0709 and 0.172

• Maximum log-likelihood = -263.91g

Page 36: 615.18 -- Simulated Annealing - umich.edu

Three Components

Old Faithful Eruptions

0.6

0.7

Fitted Distribution

quen

cy

1520

0.4

0.5

ensi

ty

Freq

510

0.1

0.2

0.3D

e

Duration (mins)

1 2 3 4 5 6

0

1 2 3 4 5 6

0.0

Duration (mins)

Page 37: 615.18 -- Simulated Annealing - umich.edu

Three Components

Old Faithful Eruptions

0.8

Fitted Density

quen

cy

1520

0.6

ensi

ty

Freq

510

0.2

0.4De

Duration (mins)

1 2 3 4 5 6

0

1 2 3 4 5 6

0.0

Duration (mins)

Page 38: 615.18 -- Simulated Annealing - umich.edu

Simulated Annealing:Simulated Annealing:Mixture of Three Normals

Fit 8 parameters• 2 proportions, 3 means, 3 variances

Required about ~100,000 evaluations• Found log-likelihood of ~267.89 in 30/50 runs• Found log-likelihood of ~263.91 in 20/50 runsg• With slower cooling and 500,000 evaluations, minimum found in

32/50 cases

100,000 evaluations seems like a lot…• However, consider that even a 5 point grid search along

8 dimensions would require ~400 000 evaluations!8 dimensions would require 400,000 evaluations!

Page 39: 615.18 -- Simulated Annealing - umich.edu

Convergence for Convergence for Simulated Annealing

6

4

5

ans

2

3Mea

10 50 100 150 200 250

Thousands

Iterations-250

Likelihood

-4500 50 100 150 200 250

Thousands

Page 40: 615.18 -- Simulated Annealing - umich.edu

Convergence for Convergence for Simulated Annealing

LogLikelihood

-200

-300

kelih

ood

-400

LogL

ik

-5000 25 50 75 100

Thousands

Iteration

Page 41: 615.18 -- Simulated Annealing - umich.edu

Convergence for Convergence for Simulated Annealing

LogLikelihood

-260

-264

-262

kelih

ood

-268

-266

LogL

ik

-27025 50

Thousands

Iteration

Page 42: 615.18 -- Simulated Annealing - umich.edu

Importance of Annealing StepEvaluated a greedy algorithm

G t d 100 000 d t i thGenerated 100,000 updates using the same scheme as for simulated annealing

However, changes leading to decreases in likelihood were never accepted

Led to a minima in only 4/50 cases.

Page 43: 615.18 -- Simulated Annealing - umich.edu

E-M Algorithm:E M Algorithm:A Mixture of Four Normals

Fit 11 parameters• 3 proportions, 4 means, 4 variances

Required about ~300 evaluations• F d l lik lih d f 267 89 i 1/50• Found log-likelihood of ~267.89 in 1/50 runs• Found log-likelihood of ~263.91 in 2/50 runs• Found log-likelihood of ~257.46 in 47/50 runs

"Appears" more reliable than with 3 components

Page 44: 615.18 -- Simulated Annealing - umich.edu

Simulated Annealing:Simulated Annealing:A Mixture of Four Normals

Fit 11 parameters• 3 proportions, 4 means, 4 variances

Required about ~100,000 evaluations• Found log-likelihood of ~257.46 in 50/50 runs

Again a grid search in 11 dimensions wouldAgain, a grid-search in 11 dimensions would only allow ~4-5 points per dimension and find a worse solutiona o se so ut o

Page 45: 615.18 -- Simulated Annealing - umich.edu

Four Components

Old Faithful Eruptions

1.0

quen

cy

1520

0.6

0.8

ensi

ty

Freq

510

0.2

0.4

De

Duration (mins)

1 2 3 4 5 6

0

1 2 3 4 5 6

0.0

Duration

Page 46: 615.18 -- Simulated Annealing - umich.edu

Today …

Simulated Annealing

Markov-Chain Monte-Carlo method

Searching for global minimum among local minimalocal minima

Page 47: 615.18 -- Simulated Annealing - umich.edu

Next Lecture

More detailed discussion of• MCMC methods• Simulated Annealing and Probability Distributionsg y

Introduction to Gibbs samplingIntroduction to Gibbs sampling

Page 48: 615.18 -- Simulated Annealing - umich.edu

References

Brooks and Morgan (1995)Optimization using simulated annealingThe Statistician 44:241-257

Kirkpatrick et al (1983)Optimization by simulated annealingp y gScience 220:671-680

Page 49: 615.18 -- Simulated Annealing - umich.edu
Page 50: 615.18 -- Simulated Annealing - umich.edu

I/O Notes for Problem Set 7

To read data, use "stdio.h" library

Functions for opening and closing files•FILE * fopen(char * name, char * mode);• id f l (FILE * f)•void fclose(FILE * f);

Functions for reading and writing to filesFunctions for reading and writing to files• I recommend fprintf and fscanf• Analogous to printf and scanf

Page 51: 615.18 -- Simulated Annealing - umich.edu

fopen() functionT i lTypical usage:

FILE * f = fopen("file.txt", "rt");if (f == NULL)if (f == NULL)

{printf("Error opening file\n");exit(1);exit(1);}

/* Rest of code follows *// /

File mode combines to characters:• "w" for writing and "r" for reading• f t t d f bi• "t" for text and "b" for binary

Page 52: 615.18 -- Simulated Annealing - umich.edu

fclose()Makes a file available to other programs

fclose(f);( );

To return to the beginning of a file use:

rewind(f);

To check whether the end-of-file has been reached:

feof(f);( );

Page 53: 615.18 -- Simulated Annealing - umich.edu

Writing to a FileW i i i d d blWriting a an integer and a double

int i = 2int i = 2;double x = 2.11;

fprintf(f, "My secret numbers are ""%d and %f\n", i, x);

As usual, use %d for integers, %f for doubles, %s for strings%s for strings

Page 54: 615.18 -- Simulated Annealing - umich.edu

Reading from a FileR di i d d blReading a an integer and a double

int i;double x;fscanf(f, "%d %lf\n", &i, &x);

As usual, use %d for integers, %f for floats, %lf for doubles, %s for strings

Writing %*t [where t is one of the type characters above] reads a value and discards it without storing.

R t th b f it t f d f llReturns the number of items transferred successfully

Page 55: 615.18 -- Simulated Annealing - umich.edu

Counting Items in FileFILE * f = fopen(filename, "rt");double item;int items = 0;

// Count the number of items in file

// First skip header// First skip headerfscanf(f, "%*s ");

// Read and count floating point values// g p// until file is exhaustedwhile (!feof(f) && fscanf(f, "%lf ", &item) == 1)

items++;

Page 56: 615.18 -- Simulated Annealing - umich.edu

Reading Items from a file//// Return to the beginning of filerewind(f);

// Skip header again// S p eade agafscanf(f, "%*s ");

// Allocate enough memoryd t ll t ( it )data = alloc_vector(n = items);

// Read each item into appropriate locationfor (i = 0; i < items; i++)

fscanf(f, "%lf ", &data[i]);

// Done with this file!fclose(f);fclose(f);