28
Slightly beyond Slightly beyond Turing’s Turing’s computability for computability for studying Genetic studying Genetic Programming Programming Olivier Teytaud, Tao, Olivier Teytaud, Tao, Inria, Lri, UMR CNRS Inria, Lri, UMR CNRS 8623, Univ. Paris-Sud, 8623, Univ. Paris-Sud, Pascal, Digiteo Pascal, Digiteo

Slightly beyond Turing’s computability for studying Genetic Programming

Embed Size (px)

DESCRIPTION

Slightly beyond Turing’s computability for studying Genetic Programming. Olivier Teytaud, Tao, Inria, Lri, UMR CNRS 8623, Univ. Paris-Sud, Pascal, Digiteo. Outline. What is genetic programming Formal analysis of Genetic Programming Why is there nothing else than Genetic Programming ? - PowerPoint PPT Presentation

Citation preview

Page 1: Slightly beyond Turing’s computability for studying Genetic Programming

Slightly beyond Slightly beyond Turing’s Turing’s

computability for computability for studying Genetic studying Genetic

ProgrammingProgrammingOlivier Teytaud, Tao, Inria, Olivier Teytaud, Tao, Inria, Lri, UMR CNRS 8623, Univ. Lri, UMR CNRS 8623, Univ. Paris-Sud, Pascal, DigiteoParis-Sud, Pascal, Digiteo

Page 2: Slightly beyond Turing’s computability for studying Genetic Programming

OutlineOutline

What is genetic programmingWhat is genetic programming Formal analysis of Genetic Formal analysis of Genetic

ProgrammingProgramming Why is there nothing else than Why is there nothing else than

Genetic Programming ?Genetic Programming ? Computability point of viewComputability point of view Complexity point of viewComplexity point of view

Page 3: Slightly beyond Turing’s computability for studying Genetic Programming

What is Genetic What is Genetic Programming (GP)Programming (GP)

GP = mining Turing-equivalent spaces of GP = mining Turing-equivalent spaces of functionsfunctions

Typical example: symbolic regression.Typical example: symbolic regression. Inputs:Inputs:

x1,x2,x3,…,xN in {0,1}*x1,x2,x3,…,xN in {0,1}* y1,y2,y3,…,yN in {0,1} yi=f(xi) y1,y2,y3,…,yN in {0,1} yi=f(xi) (xi,yi)(xi,yi) assumed independently identically distributed assumed independently identically distributed

(unknown distribution of probability)(unknown distribution of probability) Goal:Goal:

Finding g such that Finding g such that

EE|g(x)-y| + C |g(x)-y| + C EE Time(g,x) Time(g,x)

as small as possibleas small as possible

Page 4: Slightly beyond Turing’s computability for studying Genetic Programming

How does GP works ?How does GP works ?

GP = evolutionary algorithm.GP = evolutionary algorithm. Evolutionary algorithm:Evolutionary algorithm:

P = initial populationP = initial population While (my favorite criterion)While (my favorite criterion)

SelectionSelection = best functions in P according to = best functions in P according to some scoresome score

MutationsMutations = random perturbations of progs in = random perturbations of progs in the the SelectionSelection

Cross-overCross-over = merging of programs in the = merging of programs in the SelectionSelection

P ≈ P ≈ SelectionSelection + + MutationsMutations + + Cross-overCross-over

Page 5: Slightly beyond Turing’s computability for studying Genetic Programming

How does GP works ?How does GP works ?

GP = evolutionary algorithm.GP = evolutionary algorithm. Evolutionary algorithm:Evolutionary algorithm:

P = initial populationP = initial population While (my favorite criterion)While (my favorite criterion)

SelectionSelection = best functions in P according to = best functions in P according to some scoresome score

MutationsMutations = random perturbations of progs in = random perturbations of progs in the the SelectionSelection

Cross-overCross-over = merging of programs in the = merging of programs in the SelectionSelection

P ≈ P ≈ SelectionSelection + + MutationsMutations + + Cross-overCross-over

Does itwork ?

Page 6: Slightly beyond Turing’s computability for studying Genetic Programming

How does GP works ?How does GP works ?

GP = evolutionary algorithm.GP = evolutionary algorithm. Evolutionary algorithm:Evolutionary algorithm:

P = initial populationP = initial population While (my favorite criterion)While (my favorite criterion)

SelectionSelection = best functions in P according to = best functions in P according to some scoresome score

MutationsMutations = random perturbations of progs in = random perturbations of progs in the the SelectionSelection

Cross-overCross-over = merging of programs in the = merging of programs in the SelectionSelection

P ≈ P ≈ SelectionSelection + + MutationsMutations + + Cross-overCross-over

Does itwork ?

Definitely, yes forrobust and multimodaloptimization in complex

domains (trees, bitstrings,…).

Page 7: Slightly beyond Turing’s computability for studying Genetic Programming

How does GP works ?How does GP works ?

GP = evolutionary algorithm.GP = evolutionary algorithm. Evolutionary algorithm:Evolutionary algorithm:

P = initial populationP = initial population While (my favorite criterion)While (my favorite criterion)

SelectionSelection = best functions in P according to = best functions in P according to some scoresome score

MutationsMutations = random perturbations of progs in = random perturbations of progs in the the SelectionSelection

Cross-overCross-over = merging of programs in the = merging of programs in the SelectionSelection

P ≈ P ≈ SelectionSelection + + MutationsMutations + + Cross-overCross-over

Does itwork ?

Page 8: Slightly beyond Turing’s computability for studying Genetic Programming

How does GP works ?How does GP works ?

GP = evolutionary algorithm.GP = evolutionary algorithm. Evolutionary algorithm:Evolutionary algorithm:

P = initial populationP = initial population While (my favorite criterion)While (my favorite criterion)

SelectionSelection = best functions in P according to = best functions in P according to some scoresome score

MutationsMutations = random perturbations of progs in = random perturbations of progs in the the SelectionSelection

Cross-overCross-over = merging of programs in the = merging of programs in the SelectionSelection

P ≈ P ≈ SelectionSelection + + MutationsMutations + + Cross-overCross-over

Which score ? A nice question

for mathematicians

Page 9: Slightly beyond Turing’s computability for studying Genetic Programming

Why studying GP ?Why studying GP ? GP is studied by many peopleGP is studied by many people

5440 articles in the GP bibliography [5] 5440 articles in the GP bibliography [5] More than 880 authorsMore than 880 authors

GP seemingly worksGP seemingly works Human-competitive results Human-competitive results

http://www.genetic-programming.com/humancohttp://www.genetic-programming.com/humancompetitive.htmlmpetitive.html

Nothing else for mining Turing-equivalent Nothing else for mining Turing-equivalent spaces of programsspaces of programs

Probably better than random searchProbably better than random search Not so many mathematical fundations in GPNot so many mathematical fundations in GP Not so many open problems in Not so many open problems in

computability, in particular with computability, in particular with applicationsapplications

Page 10: Slightly beyond Turing’s computability for studying Genetic Programming

OutlineOutline

What is genetic programmingWhat is genetic programming Formal analysis of Genetic Formal analysis of Genetic

ProgrammingProgramming Why is there nothing else than Why is there nothing else than

Genetic Programming ?Genetic Programming ? Computability point of viewComputability point of view Complexity point of viewComplexity point of view

Page 11: Slightly beyond Turing’s computability for studying Genetic Programming

Formalization of GPFormalization of GP

What is typically GP ?What is typically GP ? No halting criterion. We stop when time is No halting criterion. We stop when time is

exhausted.exhausted. No use of prior knowledge; no use of f, No use of prior knowledge; no use of f,

whenever you know it.whenever you know it.

People (often) do not like GP because:People (often) do not like GP because: It is slow and has no halting criterionIt is slow and has no halting criterion It uses the yi=f(xi) and not f (different from It uses the yi=f(xi) and not f (different from

automatic code generation)automatic code generation)

Are these two elements necessary ?Are these two elements necessary ?

Page 12: Slightly beyond Turing’s computability for studying Genetic Programming

Iterative algorithmsIterative algorithms

Page 13: Slightly beyond Turing’s computability for studying Genetic Programming

Black-box ?Black-box ?

Page 14: Slightly beyond Turing’s computability for studying Genetic Programming

Formalization of GPFormalization of GP

Summary:Summary:

GP uses only the f(xi) and the Time(f,xi).GP uses only the f(xi) and the Time(f,xi).

GP never halts: O1, O2, O3, … .GP never halts: O1, O2, O3, … .

Can we do better ?Can we do better ?

Page 15: Slightly beyond Turing’s computability for studying Genetic Programming

OutlineOutline

What is genetic programmingWhat is genetic programming Formal analysis of Genetic Formal analysis of Genetic

ProgrammingProgramming Why is there nothing else than Why is there nothing else than

Genetic Programming ?Genetic Programming ? Computability point of viewComputability point of view Complexity point of viewComplexity point of view

Page 16: Slightly beyond Turing’s computability for studying Genetic Programming

Known resultsKnown results

Whenever f is available (and not only Whenever f is available (and not only the f(xi) ), computing O such that the f(xi) ), computing O such that O≡f O≡f O optimal for size (or speed, or space O optimal for size (or speed, or space

…)…)

is not possible.is not possible.

(i.e. there’s no Turing machine (i.e. there’s no Turing machine performing that task for all f)performing that task for all f)

Page 17: Slightly beyond Turing’s computability for studying Genetic Programming

A first (easy) good reason A first (easy) good reason for GP.for GP.Whenever f Whenever f isis available (and not only the f(xi) ), available (and not only the f(xi) ),

computing O1, O2, …, such that computing O1, O2, …, such that Op ≡ f for p sufficiently largeOp ≡ f for p sufficiently large Lim size(Op) optimalLim size(Op) optimalisis possible, with proved convergence rates, e.g. by possible, with proved convergence rates, e.g. by

bloat penalization:bloat penalization:- while (true)- while (true) - select the best program P for a - select the best program P for a compromisecompromise

relevance on the n first relevance on the n first examples examples

+ penalization of size,+ penalization of size,e.g. Sum |P(xi)-yi |+ C( |P| , e.g. Sum |P(xi)-yi |+ C( |P| ,

n )n ) i < ni < n

- n=n+1- n=n+1(see details of the proof and of the algorithm in the (see details of the proof and of the algorithm in the paper)paper)

Page 18: Slightly beyond Turing’s computability for studying Genetic Programming

A first (easy) good reason A first (easy) good reason for GP.for GP.Whenever f is Whenever f is notnot available (and not only the f(xi) ), available (and not only the f(xi) ),

computing O1, O2, …, such that computing O1, O2, …, such that Op ≡ f for p sufficiently largeOp ≡ f for p sufficiently large Lim size(Op) optimalLim size(Op) optimalisis possible, with proved convergence rates, e.g. by bloat possible, with proved convergence rates, e.g. by bloat

penalization:penalization:- consider a population of programs; set n=1- consider a population of programs; set n=1- while (true)- while (true) - select the best program P for a - select the best program P for a compromisecompromise

relevance on the n first examples relevance on the n first examples + penalization of size,+ penalization of size,e.g. Sum |P(xi)-yi |+ e.g. Sum |P(xi)-yi |+ C( |P| , n )C( |P| , n ) i < ni < n- n=n+1- n=n+1

(see details of the proof and of the algorithm in the (see details of the proof and of the algorithm in the paper)paper)

Page 19: Slightly beyond Turing’s computability for studying Genetic Programming

A first (easy) good reason A first (easy) good reason for GP.for GP.

Asymptotically (only!), finding an Asymptotically (only!), finding an optimal optimal

function O ≡ f is possible. function O ≡ f is possible.

No halting criterion is possibleNo halting criterion is possible

(avoids the use of an oracle in (avoids the use of an oracle in 0’)0’)

Page 20: Slightly beyond Turing’s computability for studying Genetic Programming

OutlineOutline

What is genetic programmingWhat is genetic programming Formal analysis of Genetic Formal analysis of Genetic

ProgrammingProgramming Why is there nothing else than Why is there nothing else than

Genetic Programming ?Genetic Programming ? Computability point of viewComputability point of view Complexity point of viewComplexity point of view

Page 21: Slightly beyond Turing’s computability for studying Genetic Programming

OutlineOutline

What is genetic programmingWhat is genetic programming Formal analysis of Genetic Formal analysis of Genetic

ProgrammingProgramming Why is there nothing else than Why is there nothing else than

Genetic Programming ?Genetic Programming ? Computability point of viewComputability point of view Complexity point of view:Complexity point of view:

Kolmogorov’s complexity with bounded timeKolmogorov’s complexity with bounded time Application to genetic programmingApplication to genetic programming

Page 22: Slightly beyond Turing’s computability for studying Genetic Programming

Kolmogorov’s complexityKolmogorov’s complexity

Kolmogorov’s complexity of x :Kolmogorov’s complexity of x :

Minimum size of a program generating xMinimum size of a program generating x Kolmogorov’s complexity of x with time at Kolmogorov’s complexity of x with time at

most T :most T :

Minimum size of a program generating xMinimum size of a program generating x

in time at most T.in time at most T.

Kolmogorov’s complexity in bounded time Kolmogorov’s complexity in bounded time

= computable.= computable.

Page 23: Slightly beyond Turing’s computability for studying Genetic Programming

OutlineOutline

What is genetic programmingWhat is genetic programming Formal analysis of Genetic Formal analysis of Genetic

ProgrammingProgramming Why is there nothing else than Why is there nothing else than

Genetic Programming ?Genetic Programming ? Computability point of viewComputability point of view Complexity point of view:Complexity point of view:

Kolmogorov’s complexity with bounded timeKolmogorov’s complexity with bounded time Application to genetic programmingApplication to genetic programming

Page 24: Slightly beyond Turing’s computability for studying Genetic Programming

Kolmogorov’s complexity Kolmogorov’s complexity and genetic programmingand genetic programming

GP uses expensive simulations of programsGP uses expensive simulations of programs Can we get rid of the simulation time ? e.g. by Can we get rid of the simulation time ? e.g. by

using f not only as a black box ?using f not only as a black box ? Essentially, no:Essentially, no:

Example of GP problem: finding O as small as Example of GP problem: finding O as small as possible with possible with

ETime(O,x)<TETime(O,x)<Tnn, , |O|<S|O|<Snn O(x)=yO(x)=y

If TIf Tnn = = ΩΩ(2(2nn) and some S) and some Snn = O(log(n)), this requires = O(log(n)), this requires time at least Ttime at least Tnn/polynomial(n)/polynomial(n)

Just simulating all programs shorter than SJust simulating all programs shorter than Sn n and and « faster » than T« faster » than Tn n is possible in time polynomial(n)Tis possible in time polynomial(n)Tnn

Page 25: Slightly beyond Turing’s computability for studying Genetic Programming

OutlineOutline

What is genetic programmingWhat is genetic programming Formal analysis of Genetic Formal analysis of Genetic

ProgrammingProgramming Why is there nothing else than Why is there nothing else than

Genetic Programming ?Genetic Programming ? Computability point of viewComputability point of view Complexity point of view:Complexity point of view:

Kolmogorov’s complexity with bounded timeKolmogorov’s complexity with bounded time Application to genetic programmingApplication to genetic programming

ConclusionConclusion

Page 26: Slightly beyond Turing’s computability for studying Genetic Programming

ConclusionConclusion

SummarySummary

GP is typically solving approximately problems in 0’GP is typically solving approximately problems in 0’

A lot of work about approximating NP-complete problems, but not a A lot of work about approximating NP-complete problems, but not a

lot about 0’lot about 0’

We provide a theoretical analysis of GPWe provide a theoretical analysis of GP

Conclusions:Conclusions:

GP uses expensive simulations, but the simulation cost can anyway GP uses expensive simulations, but the simulation cost can anyway

not be removed.not be removed.

GP has no halting criterion, but no halting criterion can be found.GP has no halting criterion, but no halting criterion can be found.

Also, « bloat » penalization ensures consistency Also, « bloat » penalization ensures consistency this point proposes this point proposes

a parametrization of the usual algorithms.a parametrization of the usual algorithms.

Page 27: Slightly beyond Turing’s computability for studying Genetic Programming

ConclusionConclusion

SummarySummary

GP is typically solving approximately problems in 0’GP is typically solving approximately problems in 0’

A lot of work about approximating NP-complete problems, but not a A lot of work about approximating NP-complete problems, but not a

lot about 0’lot about 0’

We provide a We provide a theoreticaltheoretical analysis of GP analysis of GP

Conclusions:Conclusions:

GP uses expensive simulations, but the simulation cost can anyway GP uses expensive simulations, but the simulation cost can anyway

not be removed.not be removed.

GP has no halting criterion, but no halting criterion can be found.GP has no halting criterion, but no halting criterion can be found.

Also, « bloat » penalization ensures consistency Also, « bloat » penalization ensures consistency this point proposes this point proposes

a parametrization of the usual algorithms.a parametrization of the usual algorithms.

Page 28: Slightly beyond Turing’s computability for studying Genetic Programming

ConclusionConclusion

SummarySummary

GP is typically solving approximately problems in 0’GP is typically solving approximately problems in 0’

A lot of work about approximating NP-complete problems, but not a A lot of work about approximating NP-complete problems, but not a

lot about 0’lot about 0’

We provide a We provide a mathematicalmathematical analysis of GP analysis of GP

Conclusions:Conclusions:

GP uses expensive simulations, but the simulation cost can anyway GP uses expensive simulations, but the simulation cost can anyway

not be removed.not be removed.

GP has no halting criterion, but no halting criterion can be found.GP has no halting criterion, but no halting criterion can be found.

Also, « bloat » penalization ensures consistency Also, « bloat » penalization ensures consistency this point proposes this point proposes

a parametrization of the usual algorithms.a parametrization of the usual algorithms.