Empirical Methods for the Analysis of Algorithmsiridia.ulb.ac.be/sls2007/Slides/sls-tutorial.pdf · Empirical Methods for the Analysis of Algorithms ... hypothesis and estimation

Empirical Methodsfor the Analysis of Algorithms

Marco Chiarandini1 Luıs Paquete2

1Department of Mathematics and Computer ScienceUniversity of Southern Denmark, Odense, Denmark

2CISUC - Centre for Informatics and Systems of the University of Coimbra,Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal

Semptember 7, 2007Workshop on Engineering SLS Algorithms

Based on:M. Chiarandini, L. Paquete, M. Preuss, E. Ridge (2007).Experiments on metaheuristics: Methodological overview and open issues.Tech. Rep. DMF-2007-03-003, The Danish Mathematical Society.

Outlook

The tutorial is about:

Applied Statisticsfor Engineers and Scientists of SLS Algorithms

We aim at providing:

I basic notions of statistics

I a review of performance measures

I an overview of scenarios with proper tools for

I exploratory data analysisI statistical inference

2

Definitions and MotivationsExploratory Data Analysis

Inferential StatisticsOutline

1. Definitions and MotivationsSLS AlgorithmsExperimental AnalysisPerformance Measures

2. Exploratory Data AnalysisRepresentation of Sampled DataRegression AnalysisCharacterization and Model Fitting

3. Inferential Statistics

3


Inferential Statistics

SLS AlgorithmsExperimental AnalysisPerformance MeasuresOutline




4







5



SLS AlgorithmsExperimental AnalysisPerformance MeasuresThe Objects of Analysis

We consider search algorithms for solving (mainly) discrete optimizationproblems

For each general problem Π (e.g., TSP, GCP) we denote by CΠ a set (orclass) of possible instances and by π ∈ CΠ a single instance.

The object of analysis are SLS algorithms, i.e., randomized searchheuristics (with no “known” guarantee of optimality).

6



SLS AlgorithmsExperimental AnalysisPerformance MeasuresDefinitions

More precisely:

I single-pass heuristics (denoted A⊥): have an embedded termination,for example, upon reaching a certain state

(generalized optimization Las Vegas algorithms [Hoos and Stutzle,2004])

I asymptotic heuristics (denoted A∞): do not have an embeddedtermination and they might improve their solution asymptotically

(both probabilistically approximately complete and essentiallyincomplete [Hoos and Stutzle, 2004])

7



SLS AlgorithmsExperimental AnalysisPerformance MeasuresDefinitions

The most typical scenario considered in research on SLS algorithms:

Asymptotic heuristics with time limit decided a prioriThe algorithm A∞ is halted when time expires.

Deterministic case: A∞ on π

returns a solution of cost x.

The performance of A∞ on π is ascalar y = x.

Randomized case: A∞ on π

returns a solution of cost X, whereX is a random variable.

The performance of A∞ on π is theunivariate Y = X.

[This is not the only relevant scenario: to be refined later]

8



SLS AlgorithmsExperimental AnalysisPerformance MeasuresGeneralization

On a specific instance, the random variable Y that defines theperformance measure of an algorithm is described by its probabilitydistribution/density function

p(y|π)

It is often more interesting to generalize the performance on a class ofinstances CΠ, that is,

p(y,CΠ) =∑π∈Π

p(y|π)p(π)

9







10



SLS AlgorithmsExperimental AnalysisPerformance MeasuresTheory vs Practice

Task: explain the performance of algorithms

Theoretical Analysis:

I worst case analysis: considers all possibleproblem instances of a problem

I average case analysis: assumes knowledge onthe distribution of problem instances.

But:

I results may have low practical relevance

I problems and algorithms are complex

Experimental Analysis:

I it is (often) easy and fast to collect data

I results are fast and have practical relevance

11



SLS AlgorithmsExperimental AnalysisPerformance MeasuresExperimental Algorithmics

Experimental Algorithmics: “is concerned with the design,implementation, tuning, debugging and performance analysis of computerprograms for solving algorithmic problems”.

[Demetrescu and Italiano, 2000]

Looks at algorithms as a problem of the natural sciences instead of”only” as a mathematical problem.

Goals

I Defining standard methodologies

I Identifying and collecting problem instances from the real-world andinstance generators

I Comparing relative performance of algorithms so as to identify thebest ones for a given application

I Identifying algorithm separators, i.e., families of problem instancesfor which the performance differ

I Providing new insights in algorithm design

12



SLS AlgorithmsExperimental AnalysisPerformance MeasuresExperimental Algorithmics

(Algorithm)Mathematical Model Simulation Program

Experiment

[McGeoch, 1996]

Algorithmic models of programs can vary according to their level ofinstantiation:

I minimally instantiated (algorithmic framework), e.g., simulatedannealing

I mildly instantiated: includes implementation strategies (datastructures)

I highly instantiated: includes details specific to a particularprogramming language or computer architecture

13



SLS AlgorithmsExperimental AnalysisPerformance MeasuresSampling

In experiments,

I we sample the population of instances and

I we sample the performance of the algorithm on each sampledinstance

If on an instance π we run the algorithm r times then we have r

replicates of the performance measure Y, denoted Y1, . . . , Yr, which areindependent and identically distributed (i.i.d.), i.e.

p(y1, . . . , yr|π) =

r∏j=1

p(yj|π)

p(y1, . . . , yr) =∑

π∈CΠ

p(y1, . . . , yr|π)p(π).

14



SLS AlgorithmsExperimental AnalysisPerformance MeasuresTest Instance Selection

In real-life applications a simulation of p(π) can be obtained by historicaldata.

In research studies instances may be:

I real world instances

I random variants of real world-instances

I online libraries

I randomly generated instances

They may be grouped in classes according to some features whose impactmay be worth studying:

I type (for features that might impact performance)

I size (for scaling studies)

I hardness (focus on hard instances)

I application (e.g., CSP encodings of scheduling problems), ...

Within the class, instances are drawn with uniform probability p(π) = c

15



SLS AlgorithmsExperimental AnalysisPerformance MeasuresStatistical Methods

The analysis of performance is based on finite-sized sampled data.Statistics provides the methods and the mathematical basis to

I describe, summarizing, the data (descriptive statistics)

I make inference on those data (inferential statistics)

In research, statistics helps to

I guarantee replicability

I make results reliable

I help to extract relevant results from large amount of data

In the practical context of heuristic design and implementation (i.e.,engineering), statistics helps to take sound decisions with the leastamount of experimentation

16



SLS AlgorithmsExperimental AnalysisPerformance MeasuresObjectives of the Experiments

I Characterization:Interpolation: fitting models to dataExtrapolation: building models ofdata, explaining phenomena

I Standard statistical methods: linearand non linear regressionmodel fitting

I Comparison:bigger/smaller, same/different,Algorithm Configuration,Component-Based Analysis

I Standard statistical methods:experimental designs, testhypothesis and estimation

0.010.01

0.1

1

10

100

10003600

20 40 80 200 400 800 1600

Uniform random graphs

+

+

+

++

+++++

+++++

+++++

+++++

+

++++

++++++++

+++++ +++++

+++++

+++++

+++++++++++++++++++++++

++++++++++++++++++++

+++++

+++++

+++++

++++++++++++++++++++++++++

+++++

+++++++++++++++

SizeSe

cond

s

p=0 p=0.1 p=0.2 p=0.5

p=0.9

17



SLS AlgorithmsExperimental AnalysisPerformance MeasuresObjectives of the Experiments

I Characterization:Interpolation: fitting models to dataExtrapolation: building models ofdata, explaining phenomena

I Standard statistical methods: linearand non linear regressionmodel fitting

I Comparison:bigger/smaller, same/different,Algorithm Configuration,Component-Based Analysis

I Standard statistical methods:experimental designs, testhypothesis and estimation

Response−2 0 2

0.0

0.1

0.2

0.3

0.4

Alg. 1 Alg. 2 Alg. 3 Alg. 4 Alg. 5

Response−2 0 2

Alg. 1

Alg. 2

Alg. 3

Alg. 4

Alg. 5

17







18



SLS AlgorithmsExperimental AnalysisPerformance MeasuresMeasures and Transformations

On a single instance

Computational effort indicators

I CPU time (real time as measured by OS functions)

I number of elementary operations/algorithmic iterations (e.g., searchsteps, objective function evaluations, number of visited nodes in thesearch tree, consistency checks, etc.)

Solution quality indicators

I value returned by the cost function (or error fromoptimum/reference value)

19




On a class of instances

Computational effort indicators

I no transformation if the interest is in studying scaling

I standardization if a fixed time limit is used

I otherwise, better to group homogeneously the instances

Solution quality indicatorsDifferent instances implies different scales ⇒ need for an invariantmeasure

But also, see [McGeoch, 1996].

20





Solution quality indicators

I Distance or error from a reference value(assume minimization case):

e1(x, π) =x(π) − x(π)√

^σ(π)

standard score

e2(x, π) =x(π) − xopt(π)

xopt(π)relative error

e3(x, π) =x(π) − xopt(π)

xworst(π) − xopt(π)invariant [Zemel, 1981]

I optimal value computed exactly or known by instance constructionI surrogate value such bounds or best known values

I Rank (no need for standardization but loss of information)

21



Representation of Sampled DataRegression AnalysisCharacterization and Model FittingOutline




22







23



Representation of Sampled DataRegression AnalysisCharacterization and Model FittingScenarios (Refinement)

I Single-pass heuristics

I Asymptotic heuristics:Three approaches:

1. Univariate

1.a Time as an external parameter decided a priori1.b Solution quality as an external parameter decided a priori

2. Cost dependent on running time:

3. Cost and running time as two minimizing objectives

24



Representation of Sampled DataRegression AnalysisCharacterization and Model FittingDefinitions

Single-pass heuristics

Deterministic case: A⊥ on π

returns a solution of cost x withcomputational effort t (e.g.,running time).

The performance of A⊥ on π is thevector ~y = (x, t).

Randomized case: A⊥ on π

returns a solution of cost X withcomputational effort T , where X

and T are random variables.

The performance of A⊥ on π is thebivariate ~Y = (X, T).

25



Representation of Sampled DataRegression AnalysisCharacterization and Model FittingSingle-pass Heuristics

Bivariate analysis: Example

Scenario:

B 3 heuristics A⊥1 , A⊥2 , A⊥3on class CΠ.

B homogeneous instances orneed for datatransformation.

B 1 or r runs per instance

I Interest: inspecting solutioncost and running time toobserve and compare thelevel of approximation andthe speed.

Tools:

I Scatter plots ofsolution-cost and run-time

time

cost

105

110

115

120

125

0 1 2 3 4

DSATURRLF

ROS

26




Asymptotic heuristicsThere are three approaches:

1.a. Time as an external parameter decided a priori.The algorithm is halted when time expires.


returns a solution of cost x.

The performance of A∞ on π is thescalar y = x.


returns a solution of cost X, whereX is a random variable.

The performance of A∞ on π is theunivariate Y = X.

27



Representation of Sampled DataRegression AnalysisCharacterization and Model FittingAsymptotic Heuristics

Approach 1.a, Univariate analysis

Scenario:

B 3 heuristics A∞1 , A∞

2 , A∞3 on class CΠ.

(Or 3 heuristics A∞1 , A∞

2 , A∞3 on class CΠ without interest in

computation time because negligible or comparable)

B homogeneous instances (no data transformation) or heterogeneous(data transformation)

B 1 or r runs per instance

B a priori time limit imposed

I Interest: inspecting solution cost

Tools:

I Histograms (summary measures: mean or median or mode?)

I Boxplots

I Empirical cumulative distribution functions (ECDFs)

28




Approach 1.a, Univariate analysis: Example

Data representation

Histogram

95 100 105 110 115

0.00

0.05

0.10

0.15

0.20

0.25

0.30

95 100 105 110 115

0.0

0.2

0.4

0.6

0.8

1.0

100 105 110 11595 100 105 110 115

Boxplot

95

Density

Fn(x

)

Empirical cumulative distribution function

Median

outliers

Q3 MaxMinQ1

IQR

Q1−1.5*IQR

Average

Measures of central tendency (mean, median, mode)

and dispersion (variance, standard deviation, inter-quartile)29






TS1

TS2

TS3

−3 −2 −1 0 1 2 3

Standard error: x −− x

σσ

TS1

TS2

TS3

0.2 0.4 0.6 0.8 1.0 1.2 1.4

Relative error: x −− x((opt))

x((opt))

TS1

TS2

TS3

0.1 0.2 0.3 0.4 0.5

Invariant error: x −− x((opt))

x((worst)) −− x((opt))

TS1

TS2

TS3

0 5 10 15 20 25 30

Ranks

30






−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Standard error: x −− x

σσ

$((T5,, values))

Pro

port

ion

<=

x

n:300 m:0

TS1

TS2

TS3

0.2 0.4 0.6 0.8 1.0 1.2 1.4

0.0

0.2

0.4

0.6

0.8

1.0

Relative error: x −− x((opt))

x((opt))

$((G,, err2))

Pro

port

ion

<=

x

n:300 m:0

TS1TS2

TS3

0.1 0.2 0.3 0.4 0.5

0.0

0.2

0.4

0.6

0.8

1.0

Invariant error: x −− x((opt))

x((worst)) −− x((opt))

Pro

port

ion

<=

x

TS1

TS2

TS3

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

1.0

Ranks

Pro

port

ion

<=

x

TS1

TS2

TS3

30



Representation of Sampled DataRegression AnalysisCharacterization and Model FittingStochastic Dominance

Definition: Algorithm A1 probabilistically dominates algorithm A2 on aproblem instance, iff its CDF is always ”below” that of A2, i.e.:

F1(x) ≤ F2(x), ∀x ∈ X

15 20 25 30 35 40 45

0.0

0.2

0.4

0.6

0.8

1.0

x

F(x

)

20 30 40 50

0.0

0.2

0.4

0.6

0.8

1.0

x

F(x

)

31





1.b. Solution quality as an external parameter decided a priori. Thealgorithm is halted when quality is reached.


finds a solution in running time t.

The performance of A∞ on π is thescalar y = t.

Randomized case: A∞ on π findsa solution in running time T , whereT is a random variable.

The performance of A∞ on π is theunivariate Y = T .

32



Representation of Sampled DataRegression AnalysisCharacterization and Model FittingDealing with Censored Data

Asymptotic heuristics, Approach 1.b

B Heuristics A⊥ stopped before completion or A∞ truncated (alwaysthe case)

I Interest: determining whether a prefixed goal (optimal/feasible) hasbeen reached

The computational effort to attain the goal can be specified by acumulative distribution function F(t) = P(T < t) with T in [0,∞).

If in a run i we stop the algorithm at time Li then we have a Type I rightcensoring, that is, we know either

I Ti if Ti ≤ Li

I or Ti ≥ Li.

Hence, for each run i we need to record min(Ti, Li) and the indicatorvariable for observed optimal/feasible solution attainment,δi = I(Ti ≤ Li).

33



Representation of Sampled DataRegression AnalysisCharacterization and Model FittingDealing with Censored Data

Asymptotic heuristics, Approach 1.b: Example

B An exact vs an heuristic algorithm for the2-Edge-connectivity augmentation problem.

I Interest: time to find the optimum on different instances.

10 20 50 100 200 500 2000

0.0

0.2

0.4

0.6

0.8

1.0

Time to find the optimum

ecdf

HeuristicExact

Uncensored:

F(t) =# runs < t

n

Censored:

F(t) = 1−∏

j:tj<t

nj − dj

njt ∈ [0, L]

(Kaplan-Meier estimation model)

34





2. Cost dependent on running time:


returns a current best solution x

at each observation in t1, . . . , tk.

The performance of A∞ on π isthe profile indicated by thevector ~y = x(t1), . . . , x(tk).


produces a monotone stochasticprocess in solution cost X(τ)with any element dependent onthe predecessors.

The performance of A∞ on π isthe multivariate~Y = (X(t1), X(t2), . . . , X(tk)).

35




Approach 2, Multivariate analysis

Scenario:

B 3 heuristics A∞1 , A‘∞2 , A∞

3 on instance π.

B single instance hence no data transformation.

B r runs

I Interest: inspecting solution cost over running time to determinewhether the comparison varies over time intervals

Tools:

I Quality profiles

36




Approach 2, Multivariate analysis: Example

The performance is described by multivariate random variables of thekind ~Y = Y(t1), Y(t2), . . . , Y(lk).

Sampled data are of the form ~Yi = Yi(t1), Yi(t2), . . . , Yi(tk),i = 1, . . . , 10 (10 runs per algorithm on one instance)

time

cost

70

80

90

100

0 200 400 600 800 1000 1200

Novelty

0 200 400 600 800 1000 1200

Tabu Search

37







Time occasion

Col

ors

70

80

90

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Novelty

70

80

90

100

Tabu Search

37







time

cost

70

80

90

100

0 200 400 600 800 1000 1200

NoveltyTabu Search

The medianbehavior ofthe twoalgorithms

37





3. Cost and running time as two minimizing objectives [da Fonsecaet al., 2001]:


finds improvements in solution costwhich are represented by a set ofnon-dominated pointsy = ~yj ∈ R2, j = 1, . . . ,m ofsolution cost and running time

Randomized case: A∞ on π findsimprovements in solution costwhich are represented by a randomvariable, i.e., a random set ofnon-dominated pointsY = ~Yj ∈ R2, j = 1, . . . ,m.

The performance of A∞ on π ismeasured by the random set Y

38




Approach 3, Bi-objective analysis

Needed some definitions on dominance relations

In a run on π, A∞ passes through states, i.e., a set of pointsy = ~yj ∈ R2, j = 1, . . . ,m, each one being a vector of solution costand running time.

In Pareto sense, for points in R2

~y1 ~y2 weakly dominates y1i ≤ y2

i for all i = 1, 2

~y1 ‖ ~y2 incomparable neither ~y1 ~y2 nor ~y2 ~y1

Hence we can resume the run of the algorithm by a set of mutuallyincomparable points (i.e., weakly non-dominated points)

Extended to sets of points:

A s B weakly dominates for all ~b ∈ B exists ~a ∈ A

such that ~a ~bA ‖ B incomparable neither A S B nor B S A

39



Representation of Sampled DataRegression AnalysisCharacterization and Model FittingPerformance Measures

Multi-Objective Case

How to compare the approximation sets from multiple runs of two ormore multiobjective algorithms?

For two approximation sets A,B ∈ Ω (Ω set of all approximation sets)dominance relations may indicate:A is better then B A and B are incomparableB is better then A A and B are indifferent

Other Pareto compliant quantitative indicators have been introduced

I Unary indicators (I : Ω → R) (need a reference point) andI HypervolumeI Epsilon indicatorI R indicator

I Binary indicators (I : Ω×Ω → R):I Epsilon indicator

I RankingI Attainment function

Note that with unary indicators, if I(A) < I(B), we can only say that A isnot worse than B (they may be incomparable) [Zitzler et al., 2003].

40





Let Y = ~Yj ∈ R2, j = 1, . . . ,m be a random set of m mutuallyindependent points of solution-cost and run-time

The attainment or hitting function is defined (similarly to a cumulativedistribution function) as

F(~y) = Pr[Y s ~y] =

= Pr[~Y1 ~y ∨ ~Y1 ~y ∨ . . . ∨ ~Ym ~y] =

= Pr[optimizer attains the goal ~y in a single run]

Sample data are collections of points Y1, . . . ,Yn obtained by n

independent runs. The corresponding ECDF is defined as:

Fn(~y) =1

n

n∑i=1

I(Yi s ~y)

where I(Yi s ~y) = 1, if ~Yi1 ~y or ~Yi

2 ~y or . . . or ~Yimi

~y.41





110

210

410

6500 × 10

5

50

52

54

56

58

60

0.2

0.4

0.6

0.8

1.0

Iterations

Colors

F

col

iter

49

50

51

52

53

54

55

56

57

58

59

60

61

62

10^2 10^3 10^4 10^5 10^6

Min Fn=1/50Median Fn=25/50Max Fn=50/50

42




Approach 3, Bi-objective analysis: Example

time

cost

70

80

90

100

110

0 200 400 600 800 1000

Novelty

0 200 400 600 800 1000

TSinN1

time

cost

70

80

90

100

110

0.0 0.2 0.4 0.6 0.8 1.0

43





110

210

410

6500 × 10

5

50

52

54

56

58

60

0.2

0.4

0.6

0.8

1.0

Iterations

Colors

F

110

210

410

6500 × 10

5

50

52

54

56

58

60

0.2

0.4

0.6

0.8

1.0

Iterations

Colors

F

P(T ≤ t | X ≤ x) =

| i | ∃ Y(X, T) ∈ Yi, X ≤ x ∧ T ≤ t |

| i | ∃ Y(X, T) ∈ Yi, X ≤ x |

P(X ≤ x | T ≤ t) =

| i | ∃ Y(X, T) ∈ Yi, X ≤ x ∧ T ≤ t |

| i | ∃ Y(X, T) ∈ Yi, T ≤ t |

Empirical Qualified Run-Time Distributions

# Iterations

Success P

robabili

ty

4950515253545556575859

0.0

0.2

0.4

0.6

0.8

1.0

1e+01 1e+03 1e+05 1e+07

Empirical Solution Quality Distributions

Success P

robabili

ty

0.4

0.6

0.8

1.0

62 60 58 56 54 52 50 48

10 10^3 10^4 10^5 5^7

0.0

Colors

0.2

44







45



Representation of Sampled DataRegression AnalysisCharacterization and Model FittingCorrelation Analysis

Scenario:

B heterogeneous instances, hence data transformationB 1 or r run per instanceB consider time to goal or solution qualityI Interest: inspecting whether instances are all equally hard to solve

for different algorithms or whether some features make the instancesharder

Tools: correlation plots (eachpoint represents an instance),correlation coefficient:Population:

ρXY =cov(X, Y)

σXσY

Sample:

rXY =

∑(Xi − X)(Yi − Y)

(n − 1)sXsY[Hoos and Stutzle, 2004]

46



Representation of Sampled DataRegression AnalysisCharacterization and Model FittingCorrelation Analysis

Scenario:

B heterogeneous instances, hence data transformationB 1 or r run per instanceB consider time to goal or solution qualityI Interest: inspecting whether instances are all equally hard to solve

for different algorithms or whether some features make the instancesharder

Tools: correlation plots (eachpoint represents an instance),correlation coefficient:Population:

ρXY =cov(X, Y)

σXσY

Sample:

rXY =

∑(Xi − X)(Yi − Y)

(n − 1)sXsY[Hoos and Stutzle, 2004]

46



Representation of Sampled DataRegression AnalysisCharacterization and Model FittingScaling Analysis

Scenario:

B one heuristic A⊥ or A∞ with a priori quality goal

B data collected on instances of different size or features

I Interest: characterizing the growth of computational effort.

Given a set of data points (Ni, Yi) obtained from an experiment inwhich Yi = f(Ni), for some unknown function f(n),Find growth function class O(gu(n)) and/or Ω(gl(n)) to which f(n)belongs.

Mix of interpolation of the data trend and extrapolation beyond therange of experimentation

Tools:I Heuristic adaptation of linear regression techniques

I Log-log transformation and linear regressionI Box-cox rules [McGeoch et al., 2002]

I Smoothing techniques

47




Plots and Linear Trends

log= ''

x

y =ex

log= 'x'

xex

p(x)

y =ex

log= 'y'

x

exp(

x)

y =ex

log= 'xy'

x

exp(

x)

y =ex

log= ''

x

y =xe

log= 'x'

x

xêx

p(1)

y =xe

log= 'y'

x

xêx

p(1)

y =xe

log= 'xy'

x

xêx

p(1)

y =xe

log= ''

y =log x

log= 'x'

log(

x)

y =log x

log= 'y'

log(

x)

y =log x

log= 'xy'

log(

x)

y =log x

48



Representation of Sampled DataRegression AnalysisCharacterization and Model FittingLinear Regression

Simple Linear Regression: dependent variable + independent variable

Yi = β0 + βXi + εi

Uses the Least Squares Method:

min∑

i

ε2i

εi = Yi − β0 − βXi

The indicator of the quality of fitness is the coefficient of determinationR2 (but use with caution)

Multiple Linear regression considers multiple predictors

Yi = β0 + β1X1i + β2X2i + εi

The indicator of the quality of fitness is the adjusted R2 statistic

49




Example

Running time of RLF for Graph Coloring:with two regressors: # vertices and edge density

0.010.01

0.1

1

10

100

10003600

20 40 80 200 400 800 1600

Uniform random graphs

+

+

+

++

+++++

+++++

+++++

+++++

+

++++

++++++++

+++++ +++++

+++++

+++++

+++++++++++++++++++++++

++++++++++++++++++++

+++++

+++++

+++++

++++++++++++++++++++++++++

+++++

+++++++++++++++

Size

Seco

nds

p=0 p=0.1 p=0.2 p=0.5

p=0.9

50







51



Representation of Sampled DataRegression AnalysisCharacterization and Model FittingCharacterization of Run-time

Parametric models used in the analysis of run-times to:

I provide more informative experimental results

I make more statistically rigorous comparisons of algorithms

I exploit the properties of the model (eg, the character of long tailsand completion rate)

I predict missing data in case of censored distributions

Procedure:

I choose a model

I apply fitting methodmaximum likelihood estimation method:

maxθ∈Θ

logn∏

i=1

p(Xi, θ)

I test the model

52




The distributions used are [Frost et al., 1997; Gomes et al., 2000]:

0 1 2 3 4

0.0

0.5

1.0

1.5

Exponential

x

f(x)

0 1 2 3 4

0.0

0.5

1.0

1.5

Weibull

x

f(x)

0 1 2 3 4

0.0

0.5

1.0

1.5

Log−normal

x

f(x)

0 1 2 3 4

0.0

0.5

1.0

1.5

Gamma

x

f(x)

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Exponential

x

h(x)

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Weibull

x

h(x)

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Log−normal

x

h(x)

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Gamma

xh(

x)

53




Motivations for these distributions:

I qualitative information on the completion rate (= hazard function)

I empirical good fitting

To check whether a parametric family of models is reasonable the idea isto make plots that should be linear. Departures from linearity of the datacan be easily appreciated by eye.

Example: for an exponential distribution:

log S(t) = −λt S(t) = 1 − F(t) is the survivor function

hence the plot of log S(t) against t should be linear

Similarly, for the Weibull the cumulative hazard function is linear on alog-log plot

54




Example

Graphical inspection for the two censored distributions from the previousexample on 2-edge-connectivity.

1 5 50 500

0.0

0.2

0.4

0.6

0.8

1.0

t

ecdf

HeuristicExact

500 1000 1500 20000.0

0.2

0.4

0.6

0.8

1.0

linear => exponential

t

log

S(t

)

1 5 50 500

−3

−2

−1

0

1

linear => weibull

log t

log

H(t

)

55




Example

1 5 10 50 100 500

0.0

0.2

0.4

0.6

0.8

1.0

Time to find the optimum

ecdf

Heuristic − LognormalExact − Exponential

55



Representation of Sampled DataRegression AnalysisCharacterization and Model FittingExtreme Value Statistics

I Extreme value statistics focuses on characteristics related to the tailsof a distribution function

1. extreme quantiles (e.g., minima)2. indices describing tail decay

I ‘Classical’ statistical theory: analysis of means.Central limit theorem: X1, . . . , Xn i.i.d. with FX

√n

X − µ√Var(X)

D−→ N(0, 1), as n → ∞Heavy tailed distributions: mean and/or variance may not be finite!

56




Heavy Tails

Gomes et al. [2000] analyze the mean computational cost to find asolution on a single instance

On the left, the observed behavior calculated over an increasing numberof runs.On the right, the case of data drawn from normal or gamma distributions

I The use of the median instead of the mean is recommended

I The existence of the moments (e.g., mean, variance) is determinedby the tails behavior: a case like the left one arises in presence oflong tails

57


Inferential StatisticsOutline




58


Inferential StatisticsInferential Statistics

I We work with samples (instances, solution quality)

I But we want sound conclusions: generalization over a givenpopulation (all possible instances)

I Thus we need statistical inference

Random SampleXn

Statistical Estimator θ

PopulationP(x, θ)

Parameter θ

Inference

Since the analysis is based on finite-sized sampled data, statements like

“the cost of solutions returned by algorithm A is smaller thanthat of algorithm B”

must be completed by

“at a level of significance of 5%”.

59


Inferential StatisticsA Motivating Example

I There is a competition and two algorithms A1 and A2 aresubmitted.

I We run both algorithms once on n instances.On each instance either A1 wins (+) or A2 wins (-) or they make atie (=).

Questions:

1. If we have only 10 instances and algorithm A1 wins 7 times howconfident are we in claiming that algorithm A1 is the best?

2. How many instances and how many wins should we observe to gaina confidence of 95% that the algorithm A1 is the best?

60



I p: probability that A1 wins on each instance (+)I n: number of runs without tiesI Y: number of wins of algorithm A1

If each run is indepenedent and consitent:

Y ∼ b(n, p) : Pr[Y = y] =

(n

y

)py(1 − p)1−y

Under the conditions of question 1,we can check how unlikely thesituation is if it were p(+) ≤ p(−).

If p = 0.5 then the chance thatalgorithm A1 wins 7 or more timesout of 10 is 17.2%: quite high!

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

y

b(10

,0.5

)

61



I p: probability that A1 wins on each instance (+)I n: number of runs without tiesI Y: number of wins of algorithm A1

If each run is indepenedent and consitent:

Y ∼ b(n, p) : Pr[Y = y] =

(n

y

)py(1 − p)1−y

Under the conditions of question 1,we can check how unlikely thesituation is if it were p(+) ≤ p(−).

If p = 0.5 then the chance thatalgorithm A1 wins 7 or more timesout of 10 is 17.2%: quite high!

To answer question 2. we computethe 95% quantile, i.e., y : Pr[Y ≥y] < 0.05 with p = 0.5 at differentvalues of n:

n 10 11 12 13 14 15y 9 9 10 10 11 12

n 16 17 18 19 20y 12 13 13 14 15

61

Summary

What we saw:

I Simple Comparative Analysis

I Characterization

What remains:

I Methods for statistical inference

I Experimental Designs techniques for component based analysis

I Advanced Designs for tuning and configuring

(This will be covered in the next talks and in Ruben’s tutorial)

But also:

I Reactive/Learning approaches

I Phase transitions

I Landscape analysis

I ...

62

References

ReferencesH. Hoos, T. Stutzle (2004). Stochastic Local Search: Foundations and Applications. Morgan Kaufmann Publishers, San Francisco, CA,

USA.

C. Demetrescu, G. F. Italiano (2000). What do we learn from experimental algorithmics? In M. Nielsen, B. Rovan (Eds.), MFCS , vol.1893 of Lecture Notes in Computer Science, pp. 36–51, Springer Verlag, Berlin, Germany, Berlin.

C. C. McGeoch (1996). Toward an experimental method for algorithm simulation. INFORMS Journal on Computing , vol. 8, no. 1, pp.1–15.

E. Zemel (August 1981). Measuring the quality of approximate solutions to zero-one programming problems. Mathematics of operationsresearch, vol. 6, no. 3, pp. 319–332.

V. G. da Fonseca, C. M. Fonseca, A. O. Hall (2001). Inferential performance assessment of stochastic optimisers and the attainmentfunction. In C. Coello, D. Corne, K. Deb, L. Thiele, E. Zitzler (Eds.), Proceedings of Evolutionary Multi-Criterion Optimization FirstInternational Conference, EMO 2001 , vol. 1993 of Lecture Notes in Computer Science, pp. 213–224, Springer Verlag, Berlin, Germany.

E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, V. G. da Fonseca (2003). Performance assessment of multiobjective optimizers: ananalysis and review. IEEE Trans. Evolutionary Computation, vol. 7, no. 2, pp. 117–132.

C. McGeoch, P. Sanders, R. Fleischer, P. R. Cohen, D. Precup (2002). Using finite experiments to study asymptotic performance. InR. Fleischer, B. Moret, E. M. Schmidt (Eds.), Experimental Algorithmics: From Algorithm Design to Robust and Efficient Software,vol. 2547 of Lecture Notes in Computer Science, pp. 93–126, Springer Verlag, Berlin, Germany.

D. Frost, I. Rish, L. Vila (1997). Summarizing csp hardness with continuous probability distributions. In AAAI/IAAI , pp. 327–333.

C. Gomes, B. Selman, N. Crato, H. Kautz (2000). Heavy-tailed phenomena in satisfiability and constraint satisfaction problems. Journal ofAutomated Reasoning , vol. 24, no. 1-2, pp. 67–100.

I. M. Ovacik, S. Rajagopalan, R. Uzsoy (2000). Integrating interval estimates of global optima and local search methods for combinatorialoptimization problems. Journal of Heuristics, vol. 6, no. 4, pp. 481–500.

J. Husler, P. Cruz, A. Hall, C. M. Fonseca (2003). On optimization and extreme value theory. Methodology and Computing in AppliedProbability , vol. 5, pp. 183–195.

M. Birattari (2005). The Problem of Tuning Metaheuristics as Seen from a Machine Learning Perspective. No. DISKI 292, Infix/Aka,Berlin, Germany.

J. D. Petruccelli, B. Nandram, M. Chen (1999). Applied Statistics for Engineers and Scientists. Prentice Hall, Englewood Cliffs, NJ, USA.

W. Conover (1999). Practical Nonparametric Statistics. John Wiley & Sons, New York, NY, USA, 3rd edn.

D. C. Montgomery, G. C. Runger (2007). Applied Statistics and Probability for Engineers. John Wiley & Sons, forth edn.

J. F. Lawless (1982). Statistical Models and Methods for Lifetime Data. Wiley Series in Probability and Mathematical Statistics, jws.

G. Seber (2004). Multivariate observations. Wiley series in probability and statistics, John Wiley.

M. H. Kutner, C. J. Nachtsheim, J. Neter, W. Li (2005). Applied Linear Statistical Models. McGraw Hill, 5th edn.

Documents

Empirical Methods for the Analysis of Algorithmsiridia.ulb.ac.be/sls2007/Slides/sls-tutorial.pdf · Empirical Methods for the Analysis of Algorithms ... hypothesis and estimation