Upload
buck-lawson
View
215
Download
0
Embed Size (px)
Citation preview
1
Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, IndiaComputing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India
Liliana Teodorescu, Brunel University
2
Introduction to evolutionary computation Introduction to evolutionary computation Gene Expression Programming (GEP)Gene Expression Programming (GEP) Application of GEP for Event Selection Application of GEP for Event Selection ConclusionConclusion
Liliana Teodorescu, Brunel University
3
Evolutionary computation simulates the natural evolution on a computer
process leading to maintenance or increase of a population ability to survive and reproduce in a specific environment
quantitatively measured by evolutionary fitness
Goal of natural evolution - to generate a population of individuals with
increasing fitness
Goal of evolutionary computation - to generate a set of solutions (to a
problem) of increasing quality
Liliana Teodorescu, Brunel University
4
Individual – candidate solution to a problem
Chromosome – representation of the candidate solution
decoding encoding
Gene – constituent entity of the chromosome
Population – set of individuals/chromosomes
Fitness function – representation of how good a candidate solution is
Genetic operators – operators applied on chromosomes in orderto create genetic variation (other chromosomes)
Liliana Teodorescu, Brunel University
5
Natural evolution simulation - core of the evolutionary algorithms:
Initial population creation (randomly)
Fitness evaluation (of each chromosome)
Terminate?
Selection of individuals (proportional with fitness)
Reproduction (genetic operators)
Replacement of the current population with the new one
yes
no
Stop
StartRun
Problem definition Encoding of the candidate solution Fitness definition Run Decoding the best fitted chromosome = solution
New
generation
Basic evolutionary algorithm
Liliana Teodorescu, Brunel University
6
Genetic Algorithms (GA) (J. H. Holland, 1975) Genetic Programming (GP) (J. R. Koza, 1992)
Gene Expression Programming (GEP) (C. Ferreira, 2001)
Main differences
Encoding method Reproduction method
Liliana Teodorescu, Brunel University
7
works with two entities: chromosomes and expression trees
search for the computer program that solve the problem (as GP)
Candidate solution represented by an expression tree (ET)
)()( dcba Q
+
*
d
-
ca b
ET encoded in a chromosome: read ET left - right and top - down
Q*-+abcdQ means sqrt
Decoding the chromosome (translates the chromosome in an ET)•first line of ET (root) – first element of the chromosome•next line of ET – as many arguments needed by the element in the previous line
Encoding
Liliana Teodorescu, Brunel University
8
Reproduction
Genetic operators applied on chromosoms not on ET => always produce sintactically correct structures! Recombination – exchanges parts of two chromosomes Mutation – changes the value of a node Transposition – moves a part of the chromosome to another location in the same chromosome
e.g. Mutation: Q replaced with *
*
b +
-
a Qa
a
*
b +
-
a *a
a b
*b+a-aQab+//+b+babbabbbababbaaa *b+a-aQab+//+b+babbabbbababbaaa
Liliana Teodorescu, Brunel University
9
Chromosome – has one or more genes of equal length
Gene – head: contains both functions and terminals (length h) - tail: contains only terminals (length t)
t=h(n-1)+1 n – number of arguments of the function with the highest number of arguments
e.g. set of functions: Q,*,/,-,+ set of terminals: a,bn=2; h=15 (choosen) t =16 => length of gene=15+16=31
*b+a-aQab+//+b+babbabbbababbaaa
*
b +
-
a Qa
aET ends before the end of the gene!
Liliana Teodorescu, Brunel University
10
GEP for event selection cuts/selection criteria finding classification problem (signal/background classification) statistical learning approach
5000 training events (for classification rule extraction) 5000 test events (others than training events) limitations imposed by APS 3.0
S/N = 0.25, 1, 5
Data samples: Monte-Carlo simulation from BaBar experiment Ks production in e+e- (~10 GeV)
Software resources APS 3.0 (Automatic Problem Solver)
- commercial package (Windows based) - www.gepsoft.com - function finding, classification, time series analysis
Liliana Teodorescu, Brunel University
11
Functions and constants to be used in the classification rules (cut type rule) 10 functions - AND1 (x<0 and y<0 => 1 else 0), AND2 (x0 and y0 => 1 else 0) - OR1 (x<0 or y<0 => 1 else 0), OR2 (x0 or y0 => 1 else 0) - <, >, <=, >=, =, != 36 functions -previous 10 functions + common mathematical functions constants - floating point constants (-10,10)
Data – variables usually used in a cut based analysis for selection - doca (distance of closest approach) - RXY, |RZ| (region around interaction point) - |cos(hel)| - SFL (Signed Flight Length) - Fsig (Flight Significance) - Pchi (2 probability of the vertex) - Mass (KS reconstructed mass)
SK
GEP parameters - fitness function: number of hits (events correctly classified) - gene length (head = 1-20) - no. of chromosomes per generation: 100 - no. of generations per run: 1000-20000 - genetic operators rates: mutation 0.044, inversion 0.1, transposition 0.3, recombination 0.1
Liliana Teodorescu, Brunel University
12
Fsig 5.26,
Rxy < 0.19,
doca <1,
Pchi > 0
No. of genes = 1, Head length =10
Classification Accuracy = 95.36%
Data sample: S/N =0.2510 functions
Liliana Teodorescu, Brunel University
13
Data samples: S/N =0.25, 1, 5 10 functions
95%
5%
92%
8%
8%
92%
S/N = 0.25
S/N = 1
S/N = 5
Liliana Teodorescu, Brunel University
14
Head Acc. (%) Selection criteria
1 83.34 Fsig 9.93
2 90.76 Fsig 8.80, doca <1
3 94.88 Fsig > 3.67, Rxy Pchi
4 94.88 Fsig > 3.67, Rxy Pchi
5 95.04 Fsig 3.63, |Rz| 2.65, Rxy < Pchi
7 95.04 Fsig 3.64, Rxy < Pchi, Pchi > 0
10 95.360 Fsig 5.26, Rxy < 0.19, doca <1, Pchi > 0
20 95.50 Fsig > 4.1, Rxy 0.2, SFL > 0.2, Pchi > 0,
doca > 0, Rxy Mass
Data sample: S/N =0.2510 functions
Liliana Teodorescu, Brunel University
15
20 95.50 Fsig > 4.1, Rxy 0.2, SFL > 0.2, Pchi > 0,
doca > 0, Rxy Mass
Cut-based analysis
Fsig 4.0Rxy 0.2cmSFL 0cmPchi > 0.001
doca 0.4cm|Rz| 2.8cm
GEP
No cut Fsig 4.1Rxy 0.2cmSFL 0.2cmPchi > 0
Previous cuts +doca > 0Rxy Mass
Previous cuts +doca 0.4| Rz| 2.8cm
ReductionS: 15%B: 98%
ReductionS: 16%B: 98.3%
Liliana Teodorescu, Brunel University
16
5 95.04 Fsig 3.63, |Rz| 2.65, Rxy < Pchi
Fsig 3.63|Rz| 2.65cmReduction
S: 7.6%B: 87.8%
ReductionS: 16%B: 97.8%
Fsig 3.63|Rz| 2.65cmRxy<Pchi
GEP
Liliana Teodorescu, Brunel University
17
Data samples: S/N =0.25, 1, 5 10 functions
S/N = 0.25
S/N = 1
S/N = 5
Liliana Teodorescu, Brunel University
18
No. of genes = 1, Head length =10
Classification Accuracy = 95.00%
Data sample: S/N =0.2536 mathematical functions
Liliana Teodorescu, Brunel University
19
Data sample: S/N =0.2536 functions
Classification Accuracy 95%
Liliana Teodorescu, Brunel University
20
GEP is still in the R&D phase needs software development -> underway
fast identification of powerful cuts signal/background separation of 92-95% accuracy for samples with S/N = 0.25, 1, 5 potential of discovering new correlations between variables large number of selection functions does not improve the classification accuracy
GEP allows