20
1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Embed Size (px)

Citation preview

Page 1: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

1

Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, IndiaComputing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Page 2: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

2

Introduction to evolutionary computation Introduction to evolutionary computation Gene Expression Programming (GEP)Gene Expression Programming (GEP) Application of GEP for Event Selection Application of GEP for Event Selection ConclusionConclusion

Page 3: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

3

Evolutionary computation simulates the natural evolution on a computer

process leading to maintenance or increase of a population ability to survive and reproduce in a specific environment

quantitatively measured by evolutionary fitness

Goal of natural evolution - to generate a population of individuals with

increasing fitness

Goal of evolutionary computation - to generate a set of solutions (to a

problem) of increasing quality

Page 4: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

4

Individual – candidate solution to a problem

Chromosome – representation of the candidate solution

decoding encoding

Gene – constituent entity of the chromosome

Population – set of individuals/chromosomes

Fitness function – representation of how good a candidate solution is

Genetic operators – operators applied on chromosomes in orderto create genetic variation (other chromosomes)

Page 5: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

5

Natural evolution simulation - core of the evolutionary algorithms:

Initial population creation (randomly)

Fitness evaluation (of each chromosome)

Terminate?

Selection of individuals (proportional with fitness)

Reproduction (genetic operators)

Replacement of the current population with the new one

yes

no

Stop

StartRun

Problem definition Encoding of the candidate solution Fitness definition Run Decoding the best fitted chromosome = solution

New

generation

Basic evolutionary algorithm

Page 6: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

6

Genetic Algorithms (GA) (J. H. Holland, 1975) Genetic Programming (GP) (J. R. Koza, 1992)

Gene Expression Programming (GEP) (C. Ferreira, 2001)

Main differences

Encoding method Reproduction method

Page 7: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

7

works with two entities: chromosomes and expression trees

search for the computer program that solve the problem (as GP)

Candidate solution represented by an expression tree (ET)

)()( dcba Q

+

*

d

-

ca b

ET encoded in a chromosome: read ET left - right and top - down

Q*-+abcdQ means sqrt

Decoding the chromosome (translates the chromosome in an ET)•first line of ET (root) – first element of the chromosome•next line of ET – as many arguments needed by the element in the previous line

Encoding

Page 8: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

8

Reproduction

Genetic operators applied on chromosoms not on ET => always produce sintactically correct structures! Recombination – exchanges parts of two chromosomes Mutation – changes the value of a node Transposition – moves a part of the chromosome to another location in the same chromosome

e.g. Mutation: Q replaced with *

*

b +

-

a Qa

a

*

b +

-

a *a

a b

*b+a-aQab+//+b+babbabbbababbaaa *b+a-aQab+//+b+babbabbbababbaaa

Page 9: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

9

Chromosome – has one or more genes of equal length

Gene – head: contains both functions and terminals (length h) - tail: contains only terminals (length t)

t=h(n-1)+1 n – number of arguments of the function with the highest number of arguments

e.g. set of functions: Q,*,/,-,+ set of terminals: a,bn=2; h=15 (choosen) t =16 => length of gene=15+16=31

*b+a-aQab+//+b+babbabbbababbaaa

*

b +

-

a Qa

aET ends before the end of the gene!

Page 10: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

10

GEP for event selection cuts/selection criteria finding classification problem (signal/background classification) statistical learning approach

5000 training events (for classification rule extraction) 5000 test events (others than training events) limitations imposed by APS 3.0

S/N = 0.25, 1, 5

Data samples: Monte-Carlo simulation from BaBar experiment Ks production in e+e- (~10 GeV)

Software resources APS 3.0 (Automatic Problem Solver)

- commercial package (Windows based) - www.gepsoft.com - function finding, classification, time series analysis

Page 11: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

11

Functions and constants to be used in the classification rules (cut type rule) 10 functions - AND1 (x<0 and y<0 => 1 else 0), AND2 (x0 and y0 => 1 else 0) - OR1 (x<0 or y<0 => 1 else 0), OR2 (x0 or y0 => 1 else 0) - <, >, <=, >=, =, != 36 functions -previous 10 functions + common mathematical functions constants - floating point constants (-10,10)

Data – variables usually used in a cut based analysis for selection - doca (distance of closest approach) - RXY, |RZ| (region around interaction point) - |cos(hel)| - SFL (Signed Flight Length) - Fsig (Flight Significance) - Pchi (2 probability of the vertex) - Mass (KS reconstructed mass)

SK

GEP parameters - fitness function: number of hits (events correctly classified) - gene length (head = 1-20) - no. of chromosomes per generation: 100 - no. of generations per run: 1000-20000 - genetic operators rates: mutation 0.044, inversion 0.1, transposition 0.3, recombination 0.1

Page 12: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

12

Fsig 5.26,

Rxy < 0.19,

doca <1,

Pchi > 0

No. of genes = 1, Head length =10

Classification Accuracy = 95.36%

Data sample: S/N =0.2510 functions

Page 13: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

13

Data samples: S/N =0.25, 1, 5 10 functions

95%

5%

92%

8%

8%

92%

S/N = 0.25

S/N = 1

S/N = 5

Page 14: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

14

Head Acc. (%) Selection criteria

1 83.34 Fsig 9.93

2 90.76 Fsig 8.80, doca <1

3 94.88 Fsig > 3.67, Rxy Pchi

4 94.88 Fsig > 3.67, Rxy Pchi

5 95.04 Fsig 3.63, |Rz| 2.65, Rxy < Pchi

7 95.04 Fsig 3.64, Rxy < Pchi, Pchi > 0

10 95.360 Fsig 5.26, Rxy < 0.19, doca <1, Pchi > 0

20 95.50 Fsig > 4.1, Rxy 0.2, SFL > 0.2, Pchi > 0,

doca > 0, Rxy Mass

Data sample: S/N =0.2510 functions

Page 15: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

15

20 95.50 Fsig > 4.1, Rxy 0.2, SFL > 0.2, Pchi > 0,

doca > 0, Rxy Mass

Cut-based analysis

Fsig 4.0Rxy 0.2cmSFL 0cmPchi > 0.001

doca 0.4cm|Rz| 2.8cm

GEP

No cut Fsig 4.1Rxy 0.2cmSFL 0.2cmPchi > 0

Previous cuts +doca > 0Rxy Mass

Previous cuts +doca 0.4| Rz| 2.8cm

ReductionS: 15%B: 98%

ReductionS: 16%B: 98.3%

Page 16: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

16

5 95.04 Fsig 3.63, |Rz| 2.65, Rxy < Pchi

Fsig 3.63|Rz| 2.65cmReduction

S: 7.6%B: 87.8%

ReductionS: 16%B: 97.8%

Fsig 3.63|Rz| 2.65cmRxy<Pchi

GEP

Page 17: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

17

Data samples: S/N =0.25, 1, 5 10 functions

S/N = 0.25

S/N = 1

S/N = 5

Page 18: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

18

No. of genes = 1, Head length =10

Classification Accuracy = 95.00%

Data sample: S/N =0.2536 mathematical functions

Page 19: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

19

Data sample: S/N =0.2536 functions

Classification Accuracy 95%

Page 20: 1 Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India

Liliana Teodorescu, Brunel University

20

GEP is still in the R&D phase needs software development -> underway

fast identification of powerful cuts signal/background separation of 92-95% accuracy for samples with S/N = 0.25, 1, 5 potential of discovering new correlations between variables large number of selection functions does not improve the classification accuracy

GEP allows