An Introduction to Optimization Heuristics - · PDF fileAn Introduction to Optimization Heuristics Manfred Gilli Department of Econometrics, University of Geneva and FAME ... optimization

Seminar

University of Cyprus

Department of Public and Business Administration

September 2004

An Introduction to

Optimization Heuristics

Manfred Gilli

Department of Econometrics, University of Geneva and FAME

www.unige.ch/ses/metri/gilli

1. Optimization heuristics (an overview)

2. Threshold Accepting

3. Portfolio optimization with TA

M.Gilli Optimization heuristics 2

References

Winker, P., (2001): Optimization Heuristics in Econometrics.Wiley, Chichester.

Winker, P. and M. Gilli, (2004): Applications of optimizationheuristics to estimation and modelling problems. Computa-tional Statistics & Data Analysis 47, 211–223.(www.sciencedirect.com/csda/)

Winker, P. and D. Maringer, (2005): Threshold Acceptingin Economics and Statistics. (to appear in Kluwer AppliedOptimization Series).

Gilli, M. and E. Kellezi, (2002): Portfolio Optimization withVaR and Expected Shortfall. In Computational Methods inDecision-making, Economics and Finance, (Eds. E.J. Konto-ghiorghes, B. Rustem and S. Siokos), 165–181, Kluwer Ap-plied Optimization Series.

Gilli, M. and E. Kellezi, (2002): The Threshold Accept-ing Heuristic for Index Tracking. In Financial Engineering,E-Commerce, and Supply Chain, (Eds. P. Pardalos andV.K. Tsitsiringos), 1–18, Kluwer Academic Publishers, Boston.

Gilli, M. and P. Winker, (2003): A Global OptimizationHeuristic for Estimating Agent Based Models. Computa-tional Statistics and Data Analysis, 42, 299–312.(www.sciencedirect.com/csda/)


Lecture 1

Optimization heuristics

(an overview)Outline

• Standard optimization paradigm

• Heuristic optimization paradigm

• Overview of optimization heuristics

– Simulated annealing

– Threshold accepting

– Tabu search

– Genetic algorithm

– Ant colonies

• Elements for a classification

– Basic characteristics

– Hybrid meta-heuristics


The standard optimization paradigm

Optimization problems in estimation and modelling

typically expressed as:

maxx∈X

f(x) (1)

search space X ⊂ Rn (possibly discrete)

(1) often synonymous with solution xopt

assumed to exist and frequently to be unique !!

McCullough and Vinod (1999, p. 635) state:

‘Many textbooks convey the impression that all one

has to do is use a computer to solve the problem,

the implicit and unwarranted assumptions being

that the computer’s solution is accurate and that

one software package is as good as any other ’.

Obviously this assumption is not necessarily met

Rather than being globally convex and well behaved

functions for real applications may look like →



Example from statistics:

Least median of squares estimator (LMS)

yi = xTi θ + εi i = 1, . . . , N

θLMS = argminθ

QLMS(θ)

QLMS(θ) = medi (r2i ) median of squared residuals

r2i = (yi − xTi θ)2.


Example: Objective function for the estimation of

the parameters of an agent based model of financial

markets

0.010.02

0.030.04

0.05

0.10.2

0.30.4

0.50

1

2

3

4

εδ


Only objective functions for two dimensional prob-

lems can be illustrated. In real applications it is

most likely that we have to optimize with respect

to many variables, which makes the problem much

more complex.

Classical optimization paradigm understood as:

• solution is identified by means of enumeration

or differential calculus

• existence of (unique) solution presumed

• convergence of classical optimization methods

for the solution of the corresponding first-order

conditions

Many optimization problems in statistics (e.g. OLS

estimation) fall within this category

However many optimization problems resist this

standard approach


Limits of the classical optimization paradigm

• Problems which do not fulfill the requirementsof these methods

• Cases where the standard optimization para-digm can be applied, but problem sizes mayhinder efficient calculation.

Classification (relative to the classical optimizationparadigm) of the universe of estimation and mod-elling problems:

TTTTTTTTTTTTTTTTTTTTTTTTTTTTT

&%

'$

TTTTTTTTTTTTTTTTTTTTTTTTTTT

6

easy to solve

Continuous

½½

½½½=

Discrete

tractable by standardapproximation methods

•

QQ

QQ

QQQk

application of standard methodswill probably fail


• Set X of possible solutions:

– continuous– discrete

• Easy to solve:

– continuous: (e.g. LS estimation) allowing

for an analytical solution– discrete: allowing for a solution by enumer-

ation for small scaled problems

• Tractable by standard approximation methods:

Solution can be approximated reasonably well

by standard algorithms (e.g. gradient methods)

• Complementary set: Straightforward applica-

tion of standard methods will, in general, not

even provide a good approximation of the global

optimum

The heuristic optimization paradigm

Methods:

• Based on concepts found in nature

• Have become feasible as a consequence of

growing computational power

• Although aiming at high quality solution, they

cannot pretend to produce the exact solution

in every case with certainty

Nevertheless, a stochastic high–quality approximation of

a global optimum is probably more valuable than a deter-

ministic poor–quality local minimum provided by a clas-

sical method or no solution at all.

• Easy to implement to different problems

• Side constraints on the solution can be taken

into account at low additional cots


Cross–Examination of Optimization Paradigms

ParadigmProperty Classical HeuristicAvailability

√growing

Deterministic√

sometimes

Efficiency√

/– –

Solution quality√

/– +

Multi purpose –√


Overview of optimization heuristics

Two broad classes:

• Construction methods (greedy algorithms)

• Local search methods

Solution space not explored systematically

A particular heuristic is characterized by the

way the walk through the solution domain is

organized


Classical local search for minimization

1: Generate current solution xc

2: while stopping criteria not met do

3: Select xn ∈ N (xc) (neighbor to current sol.)

4: if f(xn) < f(xc) then xc = xn

5: end while

Selection of neighbor xn and criteria for acceptance

define the walk through the solution space

Stopping criteria (a given number of iterations)

Classical meta-heuristics:

• Simulated annealing• Tabu search• Genetic algorithms• Ant colonies

Different rules for choice and/or acceptance of

neighbor solution

All (except Tabu search) accept uphill moves

(in order to escape local minima)


Simulated annealing (SA)

• Kirkpatrick, Gelatt and Vecchi (1983)

• Based on analogy between combinatorialoptimization and annealing process of solids

• Improvement of solution for move fromxc to xn always accepted

• Accepts uphill move, only with given probability(decreases in a number of rounds to zero)

1: Generate current solution xc, initialize Rmax and T

2: for r = 1 to Rmax do


4: Compute xn ∈ N (xc) (neighbor to current sol.)

5: Compute M= f(xn)− f(xc) and generate u (urv)

6: if (M < 0) or (e−M/T > u) then xc = xn

7: end while

8: Reduce T

9: end for

u

1

∆/T

e−∆/T


Threshold accepting (TA)

• Dueck and Scheuer (1990)

• Deterministic analog of Simulated Annealing

• Sequence of temperatures T replaced by

sequence of thresholds τ .

• Statement 6. of SA algorithm becomes:

if M < τ then xc = xn

• Statement 8: threshold τ reduced instead of T


Tabu search (TS)

• Glover and Laguna (1997)

• Designed for exploration of discrete search spaceswith finite set of neighbor solutions

• Avoids cycling (visiting same solution more thanonce) by use of short term memory (tabu list,most recently visited solutions)

• Statement 3: choice of xn may or may notexaminate all neighbors of xc

If more than one element is considered,xn corresponds to the best neighbor solution

1: Generate current solution xc and initialize tabu list T = ∅2: while stopping criteria not met do3: Compute xn ∈ N (xc) and xn 6∈ T4: if f(xn) < f(xc) then xc = xn and T = T ∪ xn

5: Update memory

6: end while

• Statement 5: a simple way to update memoryis to remove older entries from tabu list

• Stopping criterion: given number of iterationsor number of consecutive iterations without im-provement


Genetic algorithm (GA)

• Imitates evolutionary process of species that

sexually reproduce

• Do not operate on a single current solution,

but on a set of current solutions (population)

• New individuals P ′′ generated with cross-over :

combines part of genetic patrimony of each

parent and applies a random mutation

If new individual (child), inherits good charac-

teristics from parents → higher probability to

survive


1: Generate current population P of solutions


3: Select P ′ ⊂ P (mating pool), initialize P ′′ = ∅ (childs)

4: for i = 1 to n do

5: Select individuals xa and xb at random from P ′

6: Apply cross-over to xa and xb to produce xchild

7: Randomly mutate produced child xchild

8: P ′′ = P ′′ ∪ xchild

9: end for

10: P = survive(P ′, P ′′)

11: end while

Statement 3: Set of starting solutions

Statements 4–10: Construction of neighbor sol.

Survivors P (new population) formed either by:

• last generated individuals P ′′ (childs)

• P ′′ ∪ fittest fromP ′• only the fittest from P ′′

• the fittest from P ′ ∪ P ′′


Ant colonies (AC)

• Colorni, Dorigo and Maniezzo (1992)

• Imitates the way ants search for food and find

their way back to their nest

• First an ant explores its neighborhood randomly.

As soon as a source of food is found it starts

to transport food to the nest leaving traces of

pheromone on the ground which guide other

ants to the source

• Intensity of the pheromone traces depend on

quantity and quality of food available at source

as well as from distance between source and

nest, as for a short distance more ants will

travel on the same trail in a given time interval.

• As ants preferably travel along important trails

their behavior is able to optimize their work

• Pheromone trails evaporate and once a source

of food is exhausted the trails will disappear and

the ants will start to search for other sources


• The search area of the ant corresponds to a

discrete set of solutions

• The amount of food is associated with an ob-

jective function

• The pheromone trail is modelled with an adap-

tive memory

1: Initialize pheromone trail


3: for all ants do

4: Deposit ant randomly

5: while solution incomplete do

6: Select next element randomly according to

pheromone trail

7: end while

8: end for

9: Update pheromone trail

10: end while


Reinforced process:

• Within same time more ants can pass shorter route

• More pheromone on shorter route

• More ants attracted


Real life ants:

• Leave chemical marks (pheromone)

• Use pheromone for orientation

• Prefer trails with high pheromone

How do ants know where to go ?

• Ant at point i

• τij intensity of pheromone trail from i → j

• ηij visibility (constant) from i → j

• probability to go to j (simplest version):

pij =τij ηij∑

k

τik ηik

Trail update:

• Old pheromone evaporates partly (0 < ρ < 1)

• Ant on route i → j with length `ij spreads q

pheromone

M τij =q

`ij

• New pheromone tail

τ t+1ij = ρ τ t

ij + M τ tij


Applications:

• Travelling salesman problem• Quadratic assignment problem• Job scheduling problem• Graph coloring problem• Sequential ordering

References:

• Colorni, Dorigo and Maniezzo (1992)• Overview on different versions and applications:

Bornabeau, Dorigo and Theraulaz (1999):

Swarm Intelligence

Elements for classification

Meta-heuristic: general skeleton of an algorithm

(applicable to a wide range of problems)

May evolve to a particular heuristic (if specialized

to solve a particular problem)

• Meta-heuristics: made up by different compo-

nents

• If components from different meta-heuristics

are assembled → hybrid meta-heuristic

Proliferation of heuristic optimization methods: →need for taxonomy or classification


Basic characteristics of the meta-heuristics:

• Trajectory method: current solution slightly

modified by searching within the neighborhood

of the current solution (SA, TS)

• Discontinuous method: full solution space avail-

able for new solution. Discontinuity induced by

generation of starting solutions, (GA, AC) cor-

responds to jumps in search space

• Single agent method:

one solution per iteration processed (SA, TS)

• Multi-agent or population based method: Pop-

ulation of searching agents all of which con-

tribute to the collective experience (GA, AC)

• Guided search (search with memory usage):

Incorporates additional rules and hints on where

to search (GA: population represents memory

of recent search experience, AC: pheromone

matrix represents adaptive memory of previ-

ously visited solutions, TS: tabu list provides

short term memory)

• Unguided search or memoryless method:

relies perfectly on the search heuristic


Meta-heuristics and their features:

Features SA TA TS GA AC

Trajectory methods√ √ √

(√

) (√

)

Discontinuous methods no no no√ √

Single agent√ √ √

no no

Population based no no no√ √

Guided search (memory) no no√ √ √

Unguided search (memoryless)√ √

no no no


Hybrid meta-heuristics (HMH):

Combine elements of classical meta-heuristics →allows to imagine a large number of new techniques

Motivated by need to achieve tradeoff between:

• capabilities to explore search space

• possibility to exploit experience accumulated

during search

Classification combines hierarchical and flat scheme:

High-level (H)

Low-level (L)

Relay (R)

Co-evol (C)

Homogeneous

Heterogeneous

Global

Partial

GeneralSpecial

.....................................................................................................................

.............................................................................................

..................................................................................................................................................................................................................

...........................................................................................................................................

..................................................................................................................................

..................................................................................................................................

...........................................................................................................................................

............................................................................................................................................................

...................................................................................................................................................................................

...........................................................................................................................................

..................................................................................................................................

..................................................................................................................................

...........................................................................................................................................

............................................................................................................................................................

...................................................................................................................................................................................


Hierarchical classification of hybridizations

• Low-level: replaces component of given MH by

component from another MH

• High-level: different MH are self-contained

• Relay: combines different MH in a sequence

• Co-evolutionary: different MH cooperate


Examples:

• Low-level Relay: (not very common) e.g. SA

where neighbor xn is obtained as: select xi in

larger neighborhood of xc and perform a de-

scent local search. If this point is not accepted

return to xc (not xi).

• Low-level Co-evolutionary: GA and AC perform

well in exploration of search space but weak in

exploitation of solutions found → hybridization

for GA: greedy heuristic for crossover and TS

for mutation

• High-level Relay: e.g. greedy heuristic to gen-

erate initial population of GA and/or SA and

TS to improve population obtained by GA

Another ex.: use heuristic to optimize another

heuristic, i.e. find optimal values for parameters

• High-level Co-evolutionary: many self-contained

algorithms cooperate in a parallel search to find

an optimum


Flat classification of hybridizations

• Homogenous versus Heterogeneous: same MH

used versus combination of different MH

• Global versus Partial: all algorithms explore

same solution space versus partitioned solution

space

• Specialist versus General: combination of MH

which solve different problems versus all MH

solve the same problem

e.g. a high-level relay hybrid for the optimiza-

tion of another heuristic is a specialist hybrid


Lecture 2

Threshold Accepting (TA)• Introduction

Builds on the tutorial given by P. Winker at the “Computa-

tional Management Science” Conference and Workshop on

“Computational Econometrics and Statistics”, University of

Neuchatel, Switzerland, 2–5 April 2004.


Basic features of TA

• Similar to Simulated Annealing

• Local search heuristic (suggests slight random

modifications to the current solution thus grad-

ually moves through the search space)

• Suited for problems where solution space has

a local structure and where we can define a

neighborhood around the solution

• Accepts uphill moves to escape local optima

(on a deterministic criterion)


What do we expect from a heuristic?

• A good approximation to the global optimum

• To be robust to problem changes with respect

to tuning parameters

• Easy to implement to many problem instances


• Local search similar to Simulated Annealing

• Allows uphill moves

• Requires local structure on search space

• Requires a threshold sequence

• Converges asymptotically to global optimum


Implementation involves definition of:

• Neighborhood (Local structure on search space)

• Objective function and constraints

• Threshold sequence


Neighborhood definition

• Ω search space

• for each element x ∈ Ω

we define the neighborhood N (x) ∈ Ω

(cannot be generated !!)

• for current solution xc we compute xn ∈ N (xc)

• Neighborhood defined with ε-spheres

N (xc) = xn|xn ∈ Ω , ‖xn − xc‖ < ε‖ · ‖ Euclidian or Hamming distance


Local structure

• Objective function should exhibit local behavior

with regard to the chosen neighborhood

For elements in N (xn) the value of the objective

function should be close to f(xold) (closer than

randomly selected points)

Trade-off between large neighborhoods which

guarantee non-trivial projections and small neigh-

borhoods with a real local behavior of the ob-

jective function.

• Neighborhood relatively easy to define for func-

tions with real values variables (more difficult

for combinatorial problems)


0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.2

0.25

0.3

0.35

0.4

0.45

0.5


0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.2

0.25

0.3

0.35

0.4

0.45

0.5


0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.2

0.25

0.3

0.35

0.4

0.45

0.5


Pseudo-code for TA

1: Initialize nR, nS and τr, r = 1,2, . . . , nR

2: Generate current solution xc ∈ X3: for r = 1 to nR do

4: for i = 1 to nS do

5: Generate xn ∈ N (xc) (neighbor of xc)

6: if f(xn) < f(xc) + τr then

7: xc = xn

8: end if

9: end for

10: end for


Objective function

• Objective function is problem specific(not necessarily smooth or differentiable)

• Performance depends on fast (and exact)calculation. This can be a problem if the ob-jective function is the result of a Monte Carlosimulation.

• Local updating (to improve performance)Directly compute ∆, instead of computing ∆from f(xn)− f(xc)

Ex: Traveling salesman problem

A B

C D

..............................................................................................................................

.....................................................................................................................

................................................................................................................................................................................

.....................................................................................................................

..............................................................................................................................

A B

C D

..............................................................................................................................

...............................................................................................

................................................................................................................................................................................

......................

...............................................................................................

......................

..............................................................................................................................

∆ = d(A, C) + d(C, B) + d(B, D)

−d(A, B)− d(B, C)− d(C, D)


Constraints

• Search space Ω is a subspace Ω ⊂ Rk

• if subspace not connected or it is difficult to

generate elements in Ω:

– use Rk as search space

– add penalty term to objective function,

if xn 6∈ Ω

– increase penalty term during iterations


Lower bounds

• Theoretical lower bounds on the objective func-

tion help to assess the performance of the al-

gorithm

• In particular, if f sol = lower bound, a global

optimum is identified


Threshold sequence

• Althofer and Koschnick (1991) prove conver-

gence for “appropriate threshold sequence”

• In practice the threshold sequence is computed

from the empirical distribution of a sequence of

∆’s

1: for i = 1 to n do

2: Randomly choose x1 ∈ Ω

3: Compute x2 ∈ N (x1)

4: Compute ∆i = |f(x1)− f(x2)|5: end for

6: Compute empirical distribution of trimmed ∆i,

i = 1, . . . , b0.8nc7: Provide percentiles Pi, i = 1, . . . , nR

8: Compute corresponding quantiles Qi, i = 1, . . . , nR

The threshold sequence τi, i = 1, . . . , nR

corresponds to Qi, i = 1, . . . , nR


Empirical distribution of ∆’s

0.18 1.47 3.89x 10

−5

0

0.3

0.70.9


Restarting TA

• Stochastic search heuristics like TA can be rep-

resented as stochastic mapping

TA : Ω → fmin, fmin ∼ DTA(µ, σ)

fmin is the random realization of the minimum

found by the algorithm for a given random num-

ber sequence.

DTA truncated from left by

fglobalmin = inff(x)|x ∈ Ω → DTA not normal !!!!

• Repeated application (with different seeds) of

TA (i = 1, . . . , R) provide empirical distribution

of results (fmin)

• Standard procedure reports:

minf imin|i = 1, . . . , R and sometimes R

• We suggest to provide: number of restarts R,

the empirical mean and standard deviation or

some quantiles


Question: How to chose the number of restarts?

For a given amount of computational ressources

(total iterations)

• The larger R the better the distribution

DTA(µ1, σ1) is approximated

• Less restarts and more iterations results in an

approximation of lower quality of a different dis-

tribution DTA(µ2, σ2) with a smaller expectation

µ2 < µ1

• Trade-off !!!


The following two tables are from Winker and Maringer (2005)

Traveling salesman problem with 442 points described in

Winker (2001, Ch. 8)

Iterations 100 000 1 000 000 10 000 000

Restarts 10 000 1 000 100

Mean 5317 5170 5138SD 52.8 28.7 21.810% 5251 5135 51125% 5234 5125 51071% 5204 5110 5098

A higher number of iterations produces lower means

(µ3 < µ1) and quantiles

But estimation of lower quantile for 10 000 000 it-

erations based on 100 observations only !!!

Does not answer question about how many restarts

to chose for given amount of ressources


Each sequence or restarts of the previous results is

divided into 100 sub-results, i.e. instead of consid-

ering 10 000 restarts we consider 100 times 1 000

restarts, etc.

We then count how often the overall best solution

has been found in each of the 100 sub-problems.

Iterations 100 000 1 000 000 10 000 000

Restarts 100 10 1

Times bestin 100 0 65 35

Meandeviationfrom best 73.5 5.2 14.5

Best choice for number of restarts falls between 2

extremes !!

A moderate number of restart seems a good choice.


Lecture 3

Portfolio optimization with TAOutline

• Why do we need heuristics for

Portfolio optimization

– Returns and risk measures

– Mean-variance framework

– Mean-downside risk framework

– Index tracking

– Constraints in practice

• Applications

– Benchmarking TA (mean/variance case)

– Computing mean/downside-risk frontiers

– Tracking an artificial index (benchmarking)

– Tracking market indices

• Conclusions


Traditionally portfolio optimization deals with re-

turns and risk.

In the mean-variance framework one wants either

to minimize risk for a given return or maximise re-

turn for a given risk. In such a framework classical

optimization methods work efficiently.

More recent approaches use different risk measures

which are VaR (lower quantiles) or expectations

conditional to such a quantile (expected shortfall).

Also there are some practical constraints which

in general are not considered by the classical ap-

proach.

In these situations classical optimization methods

can not be used any more.

This is where we suggest the use of heuristic opti-

mization methods.

Returns

Returns of financial assets are random variables

• What distributions are appropriate ?

• How to model dependency ?

• Established facts:

– asymmetry

– fat tails

– in general, not normal

−10 −8 −6 −4 −2 0 2 4 6 8 100

1

2

3

4

5

6

7

8

9

10x 10

−3


Risk measures (1)

Variance

(second central moment):

Var(v) = E[(v − Ev)2]

• Penalizes negative as well as positive

deviations from the mean

• Does not account for asymmetry

• May not exist (e.g. if tails too fat)


Risk measures (2)

Downside-Risk

• Partial (or conditional) moments of distribution

• Related to losses (rather than gains)

• Measures deviation from a target

– benchmark return

– short term interest rate

– desired return


−10 −8 −6 −4 −2 0 2 4 6 8 100

5

10

15

20

25

p = 0.05

VaRp

Value at Risk: quantile of distribution of portfolio

value

VaRp = F−1(p)

Shortfall probability : probability for value of

portfolio to fall below VaRp

p = P (v < VaRp)

Expected Shortfall : expected (conditional) value

for losses below the threshold VaRp

ESp = E(v | v < VaRp)

Semi-variance :

E[(v − Ev)2 | v < Ev]


Value at Risk

Industry standard for measuring market risk

(since Basle accord I (1988))

Week points:

• Estimates a single point of distribution of losses

• Does not inform about size of losses beyond

VaR, i.e. extreme events (with low probability)

but catastrophic consequences.

• Does not satisfy sub-additivity (VaR of a port-

folio might be superior to sum of VaRs of assets

in portfolio).

Expected shortfall appears superior.


Mean-variance framework (1)

• Introduced by Markowitz (1952)

• Principle: a portfolio is optimal if it maximizes

the return for a given level of risk

– One tries to find the most attractive

combination of return and risk

– Return: mean of future expected gains

– Risk: variance of future gains

0 0.01 0.02 0.03 0.04 0.05 0.060

1

2

3

4

5

6x 10

−3

Standard Deviation

Exp

ecte

d R

etur

n


Mean-variance framework (2)

Advantages

• Optimization can be done efficiently

• Well introduced among practitioners and

academics

Limitations

• Builds on restrictive hypotheses:

normality of returns

existence of first two moments of distribution

• Lack of flexibility : several practical constraints

cannot be handled with standard

optimization techniques

• Inconveniences of variance as a measure of risk


Mean-Downside Risk framework (1)

A more recent approach for portfolio choice

Introduces downside risk as a criterion for

portfolio choice plus realistic constraints.

Optimization in this framework becomes complex.


Mean – Downside Risk framework (2)

• Mean – VaR : the investor maximizes the

future value of the portfolio under the

constraint that the probability, for the future

value to go below VaR, does not exceed β

maxx

Ev

P (v < VaR) ≤ β∑

j xj = v0

x`j ≤ xj ≤ xu

j j ∈ P

• Mean – Expected Shortfall : the investor

constrains the size of losses beyond VaR

maxx

Ev

E(v | v < VaR) ≥ ν∑

j xj = v0

x`j ≤ xj ≤ xu

j j ∈ P


Index Tracking (1)

Reproduce performance of a market index by

investing in a small number of assets.

-

t1︸︷︷︸

Period ofobservation

t2︸︷︷︸

t3

Trackingperiod

?

ConstructTPF

?

Rebalance

TPF at t−3

We consider:

• realistic problem sizes

• realistic constraints


Index Tracking (2)

nA + 1 number of assets in the market

pit price of asset i at time t

xit quantity of asset i in portfolio at time t

Pt composition of portfolio at time t

Pt = xit | i = 0,1, . . . , nA

Jt set of indices of assets in Pt

Jt = i | xit 6= 0

portfolio is rebalanced at t−


Index Tracking (3)

vt− value of portfolio before rebalancement

vt− =nA∑

i=0

xi,t−1 pit

vt value of portfolio after rebalancement

vt =nA∑

i=0

xit pit =∑

i∈Jt

xit pit

rIt return of index I for period [t− 1, t]

rIt = ln

(It

It−1

)

rPt return of portfolio P for period [t− 1, t]

rPt = ln

(vt−vt−1

)= ln

( ∑nAi=0 xi,t−1 pit∑nA

i=0 xi,t−1 pi,t−1

)


Index Tracking (4)

• Tracking error (TE) Often defined as the

variance of the deviation of portfolio returns

from an index

Such a definition allows for zero TE

even if portfolio underperforms the market

Oct99 Jan00 Apr00 Aug00 Nov00 Feb01 May01 Sep01 Dec01 Mar02800

900

1000

1100

1200

1300

1400

1500

1600

S&P 500


Objective function

• Tracking error for period [t1, t2]

Et1,t2 =

(∑t2t=t1

| rPt − rI

t |α)1

α

t2 − t1

• Average of deviations

Rt1,t2 =

∑t2t=t1

(rPt − rI

t )

t2 − t1

• Objective function (to be minimized)

Ft1,t2 = λ Et1,t2 − (1− λ)Rt1,t2 λ ∈ [0,1]


Constraints (all kind of portfolios)

• Cardinality

#Jt ≤ K

• Size

xit ≥ 0 i = 0, . . . , nA

εi ≤ xitpit∑i∈Jt

xitpit≤ δi i ∈ Jt 0 ≤ εi < δi ≤ 1

• Transaction costs

Ct ≤ γ vt−

• Minimum round lots

xit = yit si

si lot size

yit number of lots


The optimization problem

minPt1

Ft1,t2 = λ Et1,t2 − (1− λ)Rt1,t2

Ct1 ≤ γ vt−1

∑

i∈Jt1

pi,t1 xi,t1 + Ct1 = vt−1

εi ≤pi,t1

xi,t1∑

i∈Jt1

pi,t1 xi,t1

≤ δi i ∈ Jt1

#Jt1 ≤ K


Other nonstandard objective functions

• maximize the probability that return on portfo-

lio beats return on benchmark by a given per-

centage before going below it by more than

another percentage

• minimize the expected time until portfolio beats

the benchmark

• maximize the expected reward obtained upon

reaching the performance goal

• minimize the expected penalty paid upon falling

to a shortfall level

• . . .


Optimization tools

• QP works for:

– Mean – variance with short selling constraints,

size and class constraints and

linear or convex transaction costs

– Index tracking with variance as TE and same

constraints

– is efficient (comes with standard software)

• Standard optimization techniques can be no

longer used if we add constraints on:

– cardinality (number of assets in portfolio)

– round lots

– buy-in threshold

– non convex transaction costs


For this problem classical methods fail to produce

reliable results and we have to resort to heuristic

optimization


Related work

• Beasley, Meade and Chang (1999)

→ GA

• Chang, Meade, Beasley and Sharaiha

→ SA, GA, TS (cardinality constraints)

• Bertsimas, Darnell and Soucy (1999)

→ MIP

• Mansini and Speranza (1999)

→ LP-based heuristics (roundlots)

• Konno and Wijayanayake (2001)

→ (concave transaction costs)

• Krokhmal, Palmquist and Uryasev (2002)

Rockafellar and Uryasev (2000, 2002)

→ LP (C–VaR)

• Lobo, Fazel and Boyd (2000)

→ MIP-based heuristics

• Jobst, Horniman, Lucas and Mitra (2000)

→ QMIP

• Kleber and Maringer (2001)

Ant colonies, GA


Parameters:

– nR number of rounds

– nS steps per round

– τr, r = 1,2, . . . , nR threshold sequence

Implementation steps

Definition of:

– (Objective function f(x))

– Neighborhood xnew ∈ Nxold

∗ Draw 2 assets a, b with probability 1nA

∗ Move fraction q from a to b

∗ Check if constraints hold

– Thresholds

∗ Evaluate objective function for a large

number of random portfolios

∗ Compute neighbors and their distances

∗ Compute empirical distribution of distances

∗ Threshold defined by quantiles


Benchmarking the TA algorithm

• Data from SMI (1997–99)• Pentium III 800 MHz, Matlab 5.x

Mean-variance solution with QP and TA

0.5 1.23 2x 10

−3

1

5

8.5

10x 10

−3

Variance of portfolio

Exp

ecte

d re

turn

Starting portfolio

Optimized portfolio

34 42 82 890

0.1

0.2

0.3

0.4

Asset index

Weight

QPTA


Returns on optimized portfolios under

shortfall constraints:

Computation of efficient frontiers

• Initial capital v0 = 8000000

• Shortfall probability β = 0.05

• 7700000 ≤ VaR ≤ 8000000

• 7550000 ≤ ES ≤ 7850000

Problem: How describe distribution of

future returns (prices)?

• generate price scenarios

• define empirical distribution from historical prices

• resample from set of historical prices


Mean-VaR

(ps resampled from historical prices)

minx

− 1

nS

nS∑

s=1

x′ps

#s |x′ps < VaR ≤ β nS

x′ p0 = v0

ι′z ≤ K⌈

ωlj v0zj

p0j

⌉≤ xj ≤

⌊ωu

j v0zj

p0j

⌋j = 1, . . . , nA

zj ∈ 0,1 j = 1, . . . , nA


Mean-Expected Shortfall

(ps resampled from historical prices)

minx

− 1

nS

nS∑

s=1

vs

1#s|vs<VaR

∑

s|vs<VaR

vs ≥ ν

#s | vs < VaR ≤ β nS

x′ p0 = v0

ι′z ≤ K⌈

ωlj v0zj

p0j

⌉≤ xj ≤

⌊ωu

j v0zj

p0j

⌋j = 1, . . . , nA

zj ∈ 0,1 j = 1, . . . , nA


Efficient frontier for Mean-VaR/ES:

• 19 assets of SMI (1997–99)

• for period t, compute 500 2-weeks returns

• resample from set of 500 returns

• compute Mean–VaR/ES portfolios v1i , i = 1, . . . , n

v0 = 8000000

for i = 1 to n do

P (v1i < VaRi) ≤ .05

E(v1i |v1

i < VaRi) ≥ ESi

max number of assets in portfolio 10

min/max holding size [.005,0.5] v0

transaction cost = 0

no shortselling

end for


Efficient frontier for Mean-VaR:

7.7 7.8 7.9 8x 10

6

8

8.01

8.02

8.03

8.04

8.05

8.06

8.07x 106

VaR

Expected portfolio value


Efficient frontier for Mean-ES:

7.5 7.6 7.7 7.8 7.9x 10

6

8

8.01

8.02

8.03

8.04

8.05

8.06

8.07x 106

Expected Shortfall

Expected portfolio value


Index Tracking: Benchmarking with artificial index

• Data set from Beasley: http://mscmga.ms.ic.

ac.uk/jeb/orlib/indtrackinfo.html

– Hang Seng (31assets)– DAX 100 (85 assets)– FTSE 100 (89 assets)– S&P 100 (98 assets)– Nikkei (225 assets)– global set (528) assets

• construct index by randomly choosing K assets

and their weights (εi = 0.01) ≤ ωi ≤ (δi = 1)

• find with TA the portfolio that tracks the index

– starting portfolio randomly chosen– α = 1– no constraint on transaction costs– no cardinality restriction

• repeat optimization 1000 times and count

number of times TA finds assets in the index


34 54 123 184 217 253 274 275 417 4860

0.05

0.1

0.15

0.2

Asset index

Weights

Artificial index portfolioTracking portfolio

0 50 100 150 200 250 3000.4

0.6

0.8

1

1.2

1.4

time

Index value

Artificial index Tracking portfolio


10−4

10−3

10−2

10−1

−2

−1

0

1

2

3

4x 10

−3

Tracking error

Exp

ecte

d r

etu

rn

0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

10−4

10−3

10−2

steps

Ob

jective

fu

nctio

n

Starting portfolio

Solution


Confidence intervals of the TA solution

0 0.5 1

x 10−4

0

1

2

3

4

5x 10

4 DAX

ns = 1000/1000

0 0.5 1

x 10−4

0

1

2

3

4

5x 10

4 FTSE

ns = 1000/1000

0 0.5 1

x 10−4

0

1

2

3

4

5x 10

4 SP

ns = 1000/1000

0 0.5 1

x 10−4

0

1

2

3

4

5x 10

4 Nikkei

ns = 996/1000

0 1 2

x 10−3

0

2000

4000

6000

8000

10000HangSeng

ns = 917/1000

0 1 2

x 10−3

0

2000

4000

6000

8000

10000AllMarkets

ns = 957/1000


Confidence intervals of the TA solution (continued)

0 0.5 1

x 10−4

0

1

2

3

4

5x 10

4 All Markets

ns = 998/1000

• mean 6.4× 10−5

• standard deviation 2.1× 10−5


Tracking errors and execution times

IndexNumber of

assets Tracking errorTime(sec)

Hang Seng 31 1.80× 10−5 5

DAX 85 4.65× 10−5 6

FTSE 89 3.11× 10−5 7

S&P 98 4.85× 10−5 7

Nikkei 225 1.80× 10−4 13

All markets 528 2.02× 10−4 22


Out-of-sample performance

• observe market index in period [100,245]

• find tracking portfolio and look at its

performance in period [245,290]

3 4 5 15 16 18 20 23 25 270

0.1

0.2

0.3

0.4

0.5Total transaction costs = 1.81%

Asset index

Wei

ghts

Initial portfolio Tracking portfolio

100 150 200 250

0.6

0.7

0.8

0.9

1

1.1

1.2

time

Inde

x va

lue

In−sample TE = 4.50e−003 Out−of−sample TE = 7.22e−003

Index Tracking portfolio


Out-of-sample performance (continued)

Rebalancing cost

240 245 250 255

0.9

0.91

0.92

0.93

0.94

0.95

0.96

0.97

0.98

time

Inde

x va

lue

Rebalancing cost

Index Tracking portfolio

Results with various constraints on K and TCmax

TCmax = 2.0% TCmax = 0.4%

K TEis TEos TEis TEos

4 8.00× 10−3 1.16× 10−2 8.00× 10−3 1.25× 10−2

10 4.50× 10−3 7.20× 10−3 5.92× 10−3 9.00× 10−3

20 1.23× 10−3 2.02× 10−3 4.59× 10−3 5.95× 10−3


Conclusions

The threshold accepting algorithm:

• allows to deal easily with all sort of constraints

of practical importance:

– cardinality constraints (integer constraints

limiting the portfolio to a specified number

of assets)

– limits in the proportions held in a given asset

– class constraints

– minimum roundlots

– transaction costs

– . . .

• is computationally efficient

(the larger the problem the more efficient)

• is easy to implement

• provides useful approximations of optima

TA opens new perspectives in the practice of

portfolio management.


Documents

An Introduction to Optimization Heuristics - · PDF fileAn Introduction to Optimization Heuristics Manfred Gilli Department of Econometrics, University of Geneva and FAME ... optimization