Upload
hakhue
View
220
Download
4
Embed Size (px)
Citation preview
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Modern Methods ofData Analysis
Lecture XIIb (14.01.08)
● Multi-Variate Analysis Methods (MVA)Contents:
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Multi-Variate Analysis Methods● Univariate – only one attribute that is represented by one
variable in data is considered● Multivariate – treat several attributes in data simultaneously,
e.g. investigate several variables at the same time
Considering attributes separately in a multi-dimensional data set can lead to wrong conclusions due to correlation
– imagine a data set of dental patients with the following attributes: age, sex, family status, education
– investigate tooth loss in univariate analysis, find that loss depends strongly on age, but also on education and family status. For instance, married people have less teeth than singles. To be married is bad for your denture???
– in multivariate analysis influence/correlations of different factors/attributes is taken into account => married people are in general older than singles
– Factoring out the attribute age => married people have in fact better teeth than singles
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Event Classification● How to exploit the information present in the discriminating variables?● often, a lot of information is also given by the correlation of variables
● Different techniques use different ways; trying to exploit (all) features.
=> compare an choose
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Methods Classification● Discriminant Analysis (mainly used)
– discriminate between different groups in data, e.g. signal and background (training events are classified as signal or background)
– find discriminant or test statistics to separate classes– reduce dimensionality (feature space) of data => simplifies
optimization
● Cluster Analysis– find groups (cluster) in data of similar attributes– contrary to discriminant method, the cluster attributes are not
known a priori
● Principal Component Analysis (often used for preprocessing)– find new (uncorrelated) variables trough transformations– new (reduced) set of variables are principal components and
describe main feature in data– reduce dimensionality in data
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Use in Particle Physics● since early 1990's multivariate methods are used in physics● e.g.: event selection for single top analysis 2006
● physics analysis tasks are inherently multi variate– event selection (trigger)– event reconstruction (tracking,vertexing, particle ID) – signal/background discrimination– parameter estimation– ...
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Discrimination with LH ratio ● Neyman-Pearson: optimal separation between 2 hypothesis (e.g.
signal & background) via cut on likelihood ratio:
● Acceptance region with largest power (highest signal purity) at given significance level given by “likelihood ratio”
● However have to know pdf analytically, which is not possible in many cases. In practice use MC simulation to approximate pdfs in multidimensional histograms
● Method impractical if too many input variables – finite MC statistics – MVA: Alternatives which have the potential to come close in seperation power to LH ratio – e.g. often done, however not optimal: Factorize likelihood, this
means neglecting correlations!
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Linear Discriminant Analysis (I)● Simplest form for test statistic is linear function of input
variables :
● How to compute optimal components of ?
mean value of signal/background data:
covariance matrix of signal/background data:
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Linear Discriminant Analysis (II)mean value of test statistic t for signal and background:
covariance of test statistic t for signal and background:
We like to maximize to get good separation between signal and background. We like the data to be as closely concentrated around mean values as possible => Minimize
Fisher discriminant
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Fisher Discriminant
B,W only depend on signal and background sample
can be computed from S & B samples, without knowledge of pdf!
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Fisher Discriminant for Gaussian Distribution
● Assume multidimensional Gaussian Distribution for S and B with
Exponential function is monotonic, thus cutting on t gives equivalent separation power than cutting on LH ratio! In this case Fisher discriminant is optimal test statistic!
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Introduction Neural Networks:● Many areas, where computers are significant
faster (“better”) than human brain:– e.g. all kind of computation & numerical algorithms
● However some areas human brain, is significant better – e.g. pattern recognition (hand writing, face recognition)
● human brain much more robust against (partially) missing information (shadow/fog on picture etc.)
● Neural nets motivated from structure of brain (“neurons”)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Model of Artificial Neuron: “Unit”● McCulloch-Pitts Unit
Σa[i,j]x[i] x[j]∫g[j]
x[1]
x[2]
x[3]
.
.
.
.
.
.
.
.
.
input input activation output outputlinks function function links
x[j] = g[j](Σa[i,j]x[i])
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Activation Function● Simple case:
step function (threshold)
● mainly used sigmoid or tanh(x):– possible to
differentiate, useful for optimization
– useful to describe non linear functions
sigmoid
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Structure of Neural Nets (I)● One directional flow of information: “feed forward” net● Typically ordered in “layers”
– input layer, output layer– hidden layers
● One hidden layer + sigmoid activation function enough to approximate any continuous function to arbitrary precision
● Two hidden layers + sigmoid activation function to describe any function to arbitrary precision.
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Structure of Neural Nets (II)● No hidden layers nets with step function:
and in n=3 dimensions:
no hidden layers: only linear separation possibleequivalent to Fisher discriminant analysis