Modern Methods of Data Analysis - physi.uni-heidelberg.demenzemer/Stat0708/statistik... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Multi-Variate Analysis

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Modern Methods ofData Analysis

Lecture XIIb (14.01.08)

● Multi-Variate Analysis Methods (MVA)Contents:


Multi-Variate Analysis Methods● Univariate – only one attribute that is represented by one

variable in data is considered● Multivariate – treat several attributes in data simultaneously,

e.g. investigate several variables at the same time

Considering attributes separately in a multi-dimensional data set can lead to wrong conclusions due to correlation

– imagine a data set of dental patients with the following attributes: age, sex, family status, education

– investigate tooth loss in univariate analysis, find that loss depends strongly on age, but also on education and family status. For instance, married people have less teeth than singles. To be married is bad for your denture???

– in multivariate analysis influence/correlations of different factors/attributes is taken into account => married people are in general older than singles

– Factoring out the attribute age => married people have in fact better teeth than singles


Event Classification● How to exploit the information present in the discriminating variables?● often, a lot of information is also given by the correlation of variables

● Different techniques use different ways; trying to exploit (all) features.

=> compare an choose


Methods Classification● Discriminant Analysis (mainly used)

– discriminate between different groups in data, e.g. signal and background (training events are classified as signal or background)

– find discriminant or test statistics to separate classes– reduce dimensionality (feature space) of data => simplifies

optimization

● Cluster Analysis– find groups (cluster) in data of similar attributes– contrary to discriminant method, the cluster attributes are not

known a priori

● Principal Component Analysis (often used for preprocessing)– find new (uncorrelated) variables trough transformations– new (reduced) set of variables are principal components and

describe main feature in data– reduce dimensionality in data


Use in Particle Physics● since early 1990's multivariate methods are used in physics● e.g.: event selection for single top analysis 2006

● physics analysis tasks are inherently multi variate– event selection (trigger)– event reconstruction (tracking,vertexing, particle ID) – signal/background discrimination– parameter estimation– ...


Discrimination with LH ratio ● Neyman-Pearson: optimal separation between 2 hypothesis (e.g.

signal & background) via cut on likelihood ratio:

● Acceptance region with largest power (highest signal purity) at given significance level given by “likelihood ratio”

● However have to know pdf analytically, which is not possible in many cases. In practice use MC simulation to approximate pdfs in multidimensional histograms

● Method impractical if too many input variables – finite MC statistics – MVA: Alternatives which have the potential to come close in seperation power to LH ratio – e.g. often done, however not optimal: Factorize likelihood, this

means neglecting correlations!


Linear Discriminant Analysis (I)● Simplest form for test statistic is linear function of input

variables :

● How to compute optimal components of ?

mean value of signal/background data:

covariance matrix of signal/background data:


Linear Discriminant Analysis (II)mean value of test statistic t for signal and background:

covariance of test statistic t for signal and background:

We like to maximize to get good separation between signal and background. We like the data to be as closely concentrated around mean values as possible => Minimize

Fisher discriminant


Fisher Discriminant

B,W only depend on signal and background sample

can be computed from S & B samples, without knowledge of pdf!


Fisher Discriminant for Gaussian Distribution

● Assume multidimensional Gaussian Distribution for S and B with

Exponential function is monotonic, thus cutting on t gives equivalent separation power than cutting on LH ratio! In this case Fisher discriminant is optimal test statistic!


Introduction Neural Networks:● Many areas, where computers are significant

faster (“better”) than human brain:– e.g. all kind of computation & numerical algorithms

● However some areas human brain, is significant better – e.g. pattern recognition (hand writing, face recognition)

● human brain much more robust against (partially) missing information (shadow/fog on picture etc.)

● Neural nets motivated from structure of brain (“neurons”)


Model of Artificial Neuron: “Unit”● McCulloch-Pitts Unit

Σa[i,j]x[i] x[j]∫g[j]

x[1]

x[2]

x[3]

.

.

.

.

.

.

.

.

.

input input activation output outputlinks function function links

x[j] = g[j](Σa[i,j]x[i])


Activation Function● Simple case:

step function (threshold)

● mainly used sigmoid or tanh(x):– possible to

differentiate, useful for optimization

– useful to describe non linear functions

sigmoid


Structure of Neural Nets (I)● One directional flow of information: “feed forward” net● Typically ordered in “layers”

– input layer, output layer– hidden layers

● One hidden layer + sigmoid activation function enough to approximate any continuous function to arbitrary precision

● Two hidden layers + sigmoid activation function to describe any function to arbitrary precision.


Structure of Neural Nets (II)● No hidden layers nets with step function:

and in n=3 dimensions:

no hidden layers: only linear separation possibleequivalent to Fisher discriminant analysis


Structure of Neural Nets (III)

1 hidden layer,& step function:konvexe polynoms

2 hidden layer& step function:overlay of severalkonvexe polynomshere: A and not B

additional layers don't bring anyfurther feature

Documents

Modern Methods of Data Analysis - physi.uni-heidelberg.demenzemer/Stat0708/statistik... · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Multi-Variate Analysis