View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
1
Advanced Analysis Algorithms
for Top AnalysisPushpa Bhat
Fermilab
Top Thinkshop 2Fermilab, ILNovember 2000
A reasonable man adapts himself to the world.An unreasonable man persists to adapt the world to himself.So, all So, all progress depends on the unreasonable one.
- Bernard Shaw
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
2
What do we gain?
b-tag efficiency in Run I: DØ ~20%, CDF ~53% b-tag efficiency in Run I: DØ ~20%, CDF ~53% But, DØ was able to measure the top quark mass But, DØ was able to measure the top quark mass with a precision approaching that of CDF by using with a precision approaching that of CDF by using multivariate techniques to separate signal and multivariate techniques to separate signal and background while minimizing the correlation of background while minimizing the correlation of the selection with the top quark mass.the selection with the top quark mass.
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
3
Optimal Analysis MethodsThe new generation of experiments will be a lot more demanding than the previous in data handling at all stagesThe time-honored procedure of choosing and applying cuts on one event variable at a time is rarely optimal!The measurements being multivariate, the optimal methods of analyses are necessarily multivariateDiscriminant Analysis: Partition multidimensional variable space, identify boundaries between classes of objects Cluster Analysis: Assign objects to groups based on similarityRegression Analysis: Functional approximation/fitting
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
4
Data Analysis TasksParticle Identification e-ID, -ID, b-ID, , q/g
Signal/Background Event Classification Signals of new physics are rare and small
(Finding a “jewel” in a hay-stack)
Parameter Estimation t mass, H mass, track parameters, for example
Function Approximation Correction functions, tag rates, fake rates
Data Exploration Data-driven extraction of information, latent structure analysis
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
5
x1x1
x2x2
Why Multivariate Methods?
x1x1
x2x2
Because they are optimal!Because they are optimal!
D(x1,x2)=2.014x1 + 1.592x2D(x1,x2)=2.014x1 + 1.592x2
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
6
Optimal Event Selection
b)p(b)(xp
s)p(s)|p(x
)x(bp
)x(sp
)(xr
b)p(b)(xp
s)p(s)|p(x
)x(bp
)x(sp
)(xr
defines decision boundariesdefines decision boundariesthat minimize the probabilitythat minimize the probabilityof misclassificationof misclassification
So, the problem mathematically reduces to that of calculating r(x), the Bayes Discriminant Function or probability densities
Posterior probabilityPosterior probability
s)|p(xb)(xp
s)|p(x
r1
r
)|( xsp
s)|p(xb)(xp
s)|p(x
r1
r
)|( xsp
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
7
Probability Density EstimatorsHistogramming:
The basic problem of non-parametric density estimation is very simple! Histogram data in M bins in each of the d feature variables
Md bins Curse Of Dimensionality In high dimensions, we would either require a huge
number of data points or most of the bins would be empty leading to an estimated density of zero.
But, the variables are generally correlated and hence tend to be restricted to a sub-space Intrinsic Dimensionality
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
8
Kernel-Based MethodsAkin to Histogramming but adopts importance sampling
Place in d-dimensional space a hypercube of side h centered on each data point x,
The estimate will have discontinuities
Can be smoothed out using different forms for kernel functions H(u). A common choice is a multivariate Gaussian kernel
N
n
n
d h
xxH
hNxp
1
11)(~
N
n
n
d h
xxH
hNxp
1
11)(~
N
n
n
d h
xx
hNxp
12
2
2/2 2
||exp
)2(
11)(~
N
n
n
d h
xx
hNxp
12
2
2/2 2
||exp
)2(
11)(~
N = Number of data points H(u) = 1 if xn in the hypercube = 0 otherwise
h=smoothingparameter
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
9
Place a hyper-sphere centered at each data point x and allow the radius to grow to a volume V until it contains K data points. Then, density at x
If our data set contains Nk points in class Ck and N points in total, then
NV
Kxp )(
NV
Kxp )(
K nearest-neighbor Method
N = Number of data pointsN = Number of data points
VN
KCxp
k
kk )|(
VN
KCxp
k
kk )|(
KKkk = # of points in volume = # of points in volume
V for class CV for class Ckk
K
K
xp
CpCxPxCp kkk
k )(
)()|()|(
K
K
xp
CpCxPxCp kkk
k )(
)()|()|(
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
10
Discriminant Approximation with Neural Networks
Output of a feed forward neural network can approximate the Bayesian posterior probability p(s|x,y)Directly without estimating class-conditional probabilities
x
y
),,( yxn
r
ryxspyxn
1),|(),,(
r
ryxspyxn
1),|(),,(
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
11
Calculating the Discriminant
Consider the sum
i
iii dyxnyxE 2]),,([),,(
Where di = 11 for signal
= 00 for background = vector of parameters
Then
r
ryxspyxn
d
yxdE
1),|(),,(0
),,(
in the limit of large data samples and provided that the function n(x,y,) is flexible enough.
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
12
NN estimates a mapping function without requiring a mathematical description of how the output formally depends on the input.
The “hidden” transformation functions, g, adapt themselves to the data as part of the training process. The number of such functions need to grow only as the complexity of the problem grows.
x1
x2
x3
x4
DNN
aijii
kjj
NN e1
1(a))};X({ D
- ggg
ij
k
Neural Networks
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
13
Why are NN models powerful?
Neural networks are universal approximators
With a sufficiently large NN, you can approximate a function to arbitrary accuracy
Convergence of approximation is rapid
High dimensionality is not a curse any more!
Model complexity can be controlled by regularization
Extrapolate gracefully
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
14
Also, they need to have optimal flexibility/complexity
x1
x2
)2sin(4.05.0)( xxh Mth Order Polynomial Fit
M=1 M=3 M=10
x1
x2
x1
x2
Simple Flexible Highly flexible
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
15
The Golden Rule
Keep it simpleAs simple as possibleNot any simpler
- Einstein
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
16
Measuring the Top Quark Mass
The DiscriminantsThe Discriminants
Discriminant variables shaded = topshaded = top DØDØ
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
17
Background-rich
Signal-rich
Measuring the Top Quark MassMeasuring the Top Quark Mass
mt = 173.3 ± 5.6(stat.) ± 6.2 (syst.) GeV/c2mt = 173.3 ± 5.6(stat.) ± 6.2 (syst.) GeV/c2
DØ Lepton+jetsDØ Lepton+jets
Strategy for Discovering the Higgs Boson
at the Tevatron
P.C. Bhat, R. Gilmartin, H. Prosper, PRD 62 (2000) P.C. Bhat, R. Gilmartin, H. Prosper, PRD 62 (2000) hep-ph/0001152
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
19
WH Results from NN AnalysisWH Results from NN AnalysisMMHH = 100 GeV/c = 100 GeV/c22
WH WH vs Wbb
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
20
WH (110 GeV/c2) NN Distributions
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
21
Results, Standard vs. NN
A good chance of discovery up to MH= 130 GeV/c2 with 20-30fb-1
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
22
Improving the Higgs Mass Resolution
13.8% 12.2%
13.1% 11..3%
13%13% 11%11%
Use mjj and HT (= Etjets ) to train NNs to predict the Higgs boson mass
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
23
Newer ApproachesEnsembles of Networks
Committees of Networks Performance can be better than the best single
network
Stacks of NetworksControl both bias and variance
Mixture of ExpertsDecompose complex problems
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
24
Bayesian ReasoningThe Bayesian approach provides a well-founded mathematical procedure to make straight-forward and meaningful model comparisons. It also allows treatment of all uncertainties in a consistent manner.
Examples of useful applications: Fitting binned data to multi-source models PLB 407 (1997) 73
Extraction of solar neutrino survival probability PRL 81(1998) 5056
Mathematically linked to adaptive algorithms such as Neural Networks (NN)
Hybrid methods involving NN for probability density estimation and Bayesian treatment can be very powerful
Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat
25
Summary
Multivariate methods have already made impact discoveries and precision measurements and will be the methods of choice in future analyses.
We have only scratched the surface in our use of advanced analysis algorithms.
Hybrid methods combining “intelligent” algorithms and probabilistic approach will be the wave of the future!