Upload
emilia-timblin
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
DD
S.TowersTerraFerMA
TerraFerMAA Suite of
Multivariate Analysis tools
Sherry TowersSUNY-SB
Version 1.0 has been released!
useable by anyone with access to the CLHEP and
Root librarieswww-d0.fnal.gov/~smjt/
multiv.html
DD
S.TowersTerraFerMA
TerraFerMA
TerraFerMA=Fermilab Multivariate Analysis (aka “FerMA”)
Convenient interface to various multivariate analysis packages (ex: MLPfit, Jetnet, PDE/GEM, Fisher discriminant,binned likelihood etc)
User fills signal and background (and data) “Samples”, which are then used as input to FerMA methods…
Includes method to sort variables to determine which are best discriminators between signal and background.
DD
S.TowersTerraFerMA
TerraFerMA
Also includes useful stats tools (correlations, means, RMS’s), and method to detect outliers.
Using a multivariate package chosen by user, FerMA will yield prob that data event is signal or background.
TerraFerMA makes it trivial to compare performance of different multivariate techniques!
Also makes it easy to reduce the number of discriminators used in an analysis!
DD
S.TowersTerraFerMA
Simple techniques: Ingore all correlations between
discriminators… For example; simple techniques
based on square cuts, or likelihood techniques which obtain multi-D likelihood from product of 1-D likelihoods.
Advantage: fast, easy understand.
Easy to tell if modelling of data is sound.
Disadvantage: useful discriminating info is lost if correlations are ignored
FerMA includes a method to determineoptimal square cuts in a
multidimensionalparameter space.
Overview of common multivariate analysis
techniques:
DD
S.TowersTerraFerMA
More powerful...
More complicated techniques take into account simple (linear) correlations between discriminators ANOVA/MANCOVA
H-Matrix Fisher-discriminant* Principal component analysis*
Projection correlation transformations*
Optimal Observables and many, many more…
Advantage: fast, more powerfulDisadvantage: can be a bit harder
to understand, systematics can be harder to assess. Harder to tell if modelling of data is sound.
DD
S.TowersTerraFerMA
Probability correlation transformations
(ProCor)
ProCor is default multivariate package in TerraFerMA. Very fast (Relatively) easy to understand
Essentially, ProCor maps every point in signal (or background) MC onto a multi-dimensional Gaussian PDF. Mapping is optimal for MC sets
with linear correlations between variables
If mapping is not optimal, ProCor tells you!
DD
S.TowersTerraFerMA
Most powerful...
Analytic/binned likelihood Advantage: easy to understand Disadvantage: difficult to implement
with many variables
Neural Networks Advantage: powerful, reasonably
fast Disadvantage: Black box! Many
parameters of method, and systematics can be difficult to assess
Kernel Estimation (Gaussian Expansion Method=GEM) (Static-Kernal Probability Density
Estimation=PDE) Advantage: powerful, easy to
understand. Unbinned estimate of original PDF. Few parameters of method.
Disadvantage: a bit slow.
DD
S.TowersTerraFerMA
Gaussian Expansion Method/Probability Density
Estimation
All kernal PDFestimation
methodsare developed
froma very simpleidea…
If a data pointlies in a regionwhere clustering
ofsignal MC points isrelatively tight,andbkgnd MC points isrelatively loose,then that point ismore likely to besignal.
DD
S.TowersTerraFerMA
GEM
Whether theclustering is“relatively
tight”can be
determinedfrom the localcovariance
matrix,calculated fromnearest
neighboursto a point
DD
S.TowersTerraFerMA
GEM/PDE
But we also want estimate of probability density...
GEM/PDE uses idea that any continuous function can be modelled from the sum of “kernel” functions (similar to idea behind Fourier series)
GEM/PDE use multi-dimensional Gaussian kernels
Each Gaussian kernel is centred about an MC point…widths of Gaussian come from local covariance matrix at that point
DD
S.TowersTerraFerMA
GEM: 1-D Gaussian
DD
S.TowersTerraFerMA
GEM/PDE: 1-D Gaussian
DD
S.TowersTerraFerMA
Boring details...
DD
S.TowersTerraFerMA
The case for fewer discriminators…
Using a large number of variables indiscriminantly can indicate a lack of forethought in the design and conceptualization of an analysis
DD
S.TowersTerraFerMA
The case for fewer discriminators…
Also, each added variable makes it more difficult to determine if modelling of data is sound, and makes analysis more difficult to understand
And, each added variable adds statistical noise…This can degrade overall discrimination power!
DD
S.TowersTerraFerMA
Optimising discrimination…
Maximise S/sqrt(S+B), or:
DD
S.TowersTerraFerMA
The curse of too many variables
Signal 5D Gaussian = (1,0,0,0,0) =
(1,1,1,1,1)Bkgnd 5D Gaussian = (0,0,0,0,0)
=
(1,1,1,1,1)
Only difference between signal and background is in first dimension. Other four dimensions are `useless’ discriminators
DD
S.TowersTerraFerMA
The curse of too many variables
DD
S.TowersTerraFerMA
The curse of too many variables
DD
S.TowersTerraFerMA
A “real-world” example…
A Tevatron RunI analysis used a 7 variable NN to discriminate between signal and background.
Were all 7 needed? Ran the signal and
background n-tuples through the TerraFerMA interface to the sorting method…
DD
S.TowersTerraFerMA
A “real-world” example…
DD
S.TowersTerraFerMA
Another “real-world” example…
A Tevatron “physics-object-ID” method uses 9 variables in the analysis.
How many are actually needed?
DD
S.TowersTerraFerMA
Another “real-world” example…
DD
S.TowersTerraFerMA
Summary
Careful examination of discriminators used in a multivariate analysis is always a good idea!
Reduction of number of variables can simplify analysis considerably, and can even increase discrimination power!
DD
S.TowersTerraFerMA
TerraFerMAVersion 1.0
TerraFerMA documentation: www-d0.fnal.gov/~smjt/
ferma.ps
TerraFerMA users’ guide: www-d0.fnal.gov/~smjt/
guide.ps
TerraFerMA package: …/ferma.tar.gz (includes an example
program in examples/simple/simple.cpp)
DD
S.TowersTerraFerMA
TerraFerMA Version 1.0
Soon to be included:Support Vector Machines
Methods to fit for the fraction of signal and bkgrnd in a data sample
Ensembles (many samples grouped together)
Enhanced ability to write-out/read-in NN weights
Want more? Let me know!
DD
S.TowersTerraFerMA
Summary
TerraFerMA is:A platform of very powerful multivariate analysis tools. In all test applications to-date, TerraFerMA has signficantly improved existing analyses!