Numerical Methods for Data Analysis - uni-mainz. · PDF fileIntroduction Data analysis in nuclear and particle physics Observe events of a certain type Measure characteristics of each

Numerical Methods for Data Analysis

Michael O. [email protected]

Bosen (Saar), August 29 - September 3, 2010

FundamentalsProbability distributionsExpectation values, error propagationParameter estimation

Regression analysisMaximum likelihoodLinear Regression

Advanced topics


[email protected]

Some statistics books, papers, etc.

Volker Blobel und Erich Lohrmann: Statistische und numerischeMethoden der Datenanalyse, Teubner Verlag (1998)Siegmund Brandt: Datenanalyse, BI Wissenschaftsverlag (1999)Philip R. Bevington: Data Reduction and Error Analysis for thePhysical Sciences, McGraw-Hill (1969)Roger J. Barlow: Statistics, John Wiley & Sons (1993)Glen Cowan: Statistical Data Analysis, Oxford University Press(1998)Frederick James: Statistical Methods in Experimental Physics,2nd Edition, World Scientific, 2006

Wes Metzger’s lecture notes:www.hef.kun.nl/~wes/stat_course/statist.pdf

Glen Cowan’s lecture notes:www.pp.rhul.ac.uk/~cowan/stat_course.html

Particle Physics Booklet: http://pdg.lbl.gov/


www.hef.kun.nl/~wes/stat_course/statist.pdf

www.pp.rhul.ac.uk/~cowan/stat_course.html

http://pdg.lbl.gov/

Introduction

Data analysis in nuclear and particle physics

Observe events of a certain type

Measure characteristics of each eventTheories predict distributions of these properties up to freeparametersSome tasks of data analysis:

Estimate (measure) the parameters;Quantify the uncertainty of the parameter estimates;Test the extent to which the predictions of a theory are inagreement with the data.


Introduction

Philosophy of ScienceKarl R. Popper (* 28. Juli 1902 in Vienna, Austria;† 17. September 1994 in London, England) coined the termcritical rationalism. At the heart of his philosophy of science liesthe account of the logical asymmetry between verification andfalsifiability. Logik der Forschung, 1934.

−→Existence of a true valueof measured quantities and derived values.


Theory of probability

Probability theory, mathematics:

−→ Kolmogorov axioms

Classical interpretation, frequentist probability:Pragmatical definition of probability:

p(E) = limN→∞

nN

n(E) = number of events EN = number of trials (experiments)Experiments have to be repeatable (in principle).Disadvantage: Strictly speaking one cannot makestatements on the probability of any true value. Only upperand lower limits are possible given a certain confidencelevel.



Probability theory, mathematicsClassical interpretation, frequentist probabilityBayesian statistics, subjective probability:Prior subjective assumptions enter into the calculation ofprobabilities of a hypotheses H.

p(H) = degree of belief that H is true

Metaphorically speaking: Probabilities are the ratio of the(maximum) wager and the anticipated prize in a bet.

Whatif there were 20 times more yellow cabs than green cabs?Would you still believe the witness?



Probability theory, mathematicsClassical interpretation, frequentist probabilityBayesian statistics, subjective probability:

Prior subjective assumptions enter into the calculation ofprobabilities of a hypotheses H.

Suppose there is a town with green and yellow taxicabs. Ina hit-and-run accident a man was hurt and a witness saw agreen cab.In court the lawer of the taxi company impeaches thecredibility of the witness, because of the lighting conditions.A test showed that under similar conditions 10% of thewitnesses confuse the color of the cabs.Would you believe the witness?

What if there were 20 times more yellow cabs than greencabs? Would you still believe the witness?



Probability theory, mathematicsClassical interpretation, frequentist probabilityBayesian statistics, subjective probability:

Prior subjective assumptions enter into the calculation ofprobabilities of a hypotheses H.

Suppose there is a town with green and yellow taxicabs. Ina hit-and-run accident a man was hurt and a witness saw agreen cab.In court the lawer of the taxi company impeaches thecredibility of the witness, because of the lighting conditions.A test showed that under similar conditions 10% of thewitnesses confuse the color of the cabs.Would you believe the witness?What if there were 20 times more yellow cabs than greencabs? Would you still believe the witness?



Probability theory, mathematicsClassical interpretation, frequentist probabilityBayesian statistics, subjective probability:Prior subjective assumptions enter into the calculation ofprobabilities of a hypotheses H.

taxicabs witness sees . . . statement is . . .200 yellow 180 × “yellow”

20 × “green” 20/29 = 69% wrong10 green 9 × “green” 9/29 = 31% true

1 × “yellow”

Disadvantage: Prior hypotheses influence the probability.Advantages for rare and one-time events, like noisy signalsor catastrophe modeling.

In this lecture we will focus on the classicalstatistics, e.g. error estimates have to be under-stood as confidence regions.



Probability theory, mathematicsClassical interpretation, frequentist probabilityBayesian statistics, subjective probability:Prior subjective assumptions enter into the calculation ofprobabilities of a hypotheses H.Disadvantage: Prior hypotheses influence the probability.Advantages for rare and one-time events, like noisy signalsor catastrophe modeling.




Probability theory, mathematicsClassical interpretation, frequentist probabilityBayesian statistics, subjective probability:Prior subjective assumptions enter into the calculation ofprobabilities of a hypotheses H.Disadvantage: Prior hypotheses influence the probability.Advantages for rare and one-time events, like noisy signalsor catastrophe modeling.



Combining probabilities

Two kinds of events are given: A and B. The probability of A isp(A) (B: p(B)). Then the probability of A or B is:

p(AorB) = p(A) + p(B)− p(AandB)

If A and B are mutually exclusive then p(AandB) = 0Example: Drawing from a deck of German Skat cards.

p(Ace or spades) =432

+8

32− 1

32=

1132

Special case: B = A (A will NOT occur).

p(Aand A) = p(A) + p(A) = 1


Combining probabilities

Joint probability of A and B occuring simultaniously:

p(AandB) = p(A) · p(B|A),

p(B|A) is called condional probability.If A and B are independent one gets p(B|A) = p(B),respectively

p(AandB) = p(A) · p(B)


Death in the mountains

In a book on mountaineering achievements of ReinholdMessner one reads the following: “If you consider that theprobability of dying in a expedition to an eight-thousander is3,4%, then Messner had a probability of 3,4% · 29 = 99% to bekilled during his 29 expeditions.”

That may not be true. What if Messner sets off to a 30thexpedition?The probability to survive an expedition is obviously1− 0,034 = 0,966. If one assumes that the various expeditionsrepresent independent events, the probability of surviving all 29expeditions is: P = 0,96629 = 0,367.



In a book on mountaineering achievements of ReinholdMessner one reads the following: “If you consider that theprobability of dying in a expedition to an eight-thousander is3,4%, then Messner had a probability of 3,4% · 29 = 99% to bekilled during his 29 expeditions.”That may not be true. What if Messner sets off to a 30thexpedition?

The probability to survive an expedition is obviously1− 0,034 = 0,966. If one assumes that the various expeditionsrepresent independent events, the probability of surviving all 29expeditions is: P = 0,96629 = 0,367.



In a book on mountaineering achievements of ReinholdMessner one reads the following: “If you consider that theprobability of dying in a expedition to an eight-thousander is3,4%, then Messner had a probability of 3,4% · 29 = 99% to bekilled during his 29 expeditions.”That may not be true. What if Messner sets off to a 30thexpedition?The probability to survive an expedition is obviously1− 0,034 = 0,966. If one assumes that the various expeditionsrepresent independent events, the probability of surviving all 29expeditions is: P = 0,96629 = 0,367.


Definitions

probability mass function (pmf) probability density function (pdf)of a measured value (=random variable)

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.1

0 5 10 15 20 25 30

f(n)

n

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.1

0 5 10 15 20 25 30

f(x)

x

f (n) discrete f (x) continuousNormalization:

f (n) ≥ 0∑

n

f (n) = 1 f (x) ≥ 0∫ ∞−∞

f (x) dx = 1

Probability:

p(n1 ≤ n ≤ n2) =

n2∑n1

f (n) p(x1 ≤ x ≤ x2) =

∫ x2

x1

f (x)dx


Definitions

Cumulative distribution function (CDF):

F (x) =

∫ x

−∞f (x ′)dx ′, F (−∞) = 0, F (∞) = 1

Example:Decay time t of a radioactive nucleus with mean life time τ :

f (t) =1τ

e−t/τ F (t) = 1− e−t/τ

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50

t/s

f(t)*12s F(t)


Expectation values and moments

Mean: A random variable X takes on the values X1, X2, . . . , Xnwith probability p(Xi), then the expected value of X (“mean”) is

X = 〈X 〉 =n∑

i=1

Xi · p(Xi)

The expected value of an arbitrary funktion h(x) for acontinuous random variable is:

E [h(x)] =

∫ ∞−∞

h(x) · f (x)dx

The mean ist the expected value of x:

E [x ] = x =

∫ ∞−∞

x · f (x)dx



standard deviation = {mean (deviation from x)2}1/2

σ2 = (x − x)2 =

∫ ∞−∞

(x − x)2 · f (x)dx

=

∫ ∞−∞

(x2 − 2xx + x2) · f (x)dx = x2 − 2x x + x2 = x2 − x2

σ2 = Variance, σ = Standard deviationDiscrete distributions:

σ2 =1N

(∑x2 − (

∑x)2

N

)Attention: This is the definition of the variance! To get a biasfree estimation of the variance, 1

N will be replaced by 1N−1 .



Moments are the expected value of xn and of (x − 〈x〉)n. Theyare called nth algebraic moment µn and nth central moment µ′n,respectivly.Skewness v(x) is a measure of the asymmetry of theprobability distribution of a random variable x :

v =µ′3σ3 =

E [(x − E [x ])3]

σ3

Kurtosis is a measure of the ”peakedness” of the probabilitydistribution of a random variable x .

γ2 =µ′4σ4 − 3 =

E [(x − E [x ])4]

σ4 − 3


Binomial distribution

The binomial distribution is the discrete probability distributionof the number of successes r in a sequence of n independentyes/no experiments, each of which yields success withprobability p (Bernoulli experiment).

P(r) =

(nr

)pr · (1− p)n−r

P(r) is normalized. Proof: Binomial theorem with q = 1− p.The mean of r is:

〈r〉 = E [r ] =n∑

r=0

rP(r)= np

The varianz σ2 is

V [r ] = E [(r − 〈r〉)2] =n∑

r=0

(r − 〈r〉)2P(r)= np(1− p)


Poisson distribution

The Poisson distribution ist given by:

P(r) =µr e−µ

r !

The mean is:

〈r〉 = µ

The variance is:

V [r ] = σ2 = np = µ

0

0.1

0.2

0.3

0.4

0.5

0.6

0 2 4 6 8 10

µ = 0.5

0

0.1

0.2

0.3

0.4

0.5

0.6

0 2 4 6 8 10

µ = 1

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 2 4 6 8 10

µ = 2

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 2 4 6 8 10

µ = 4


Law of large numbers

The law of large numbers (LLN) is a theorem that describes theresult of performing the same experiment a large number oftimes.According to the law, the average of the results obtained from alarge number of trials should be close to the expected value,and will tend to become closer as more trials are performed.We perform n independent experiments (Bernoulli trials) wherethe result j occurs nj times.

pj = E [hj ] = E [nj/n]

The variance of a Binomial distribution is:

V [hj ] = σ2(hj) = σ2(nj/n) =1n2 · σ

2(nj) =1n2 · npj(1− pj)

From the product pj(1− pj) which is ≤ 14 , we can deduce the

law of large numbers:

σ2(hj) < 1/n


The central limit theorem

The central limit theorem (CLT) states conditions under whichthe mean of a sufficiently large number of independent randomvariables, each with finite mean and variance, will beapproximately normally distributed.Let xi be a sequence of n independent and identicallydistributed random variables each having finite values ofexpectation µ and variance σ2 > 0.In the limit n→∞ the random variable w =

∑ni=1 xi will be

normally distributed with mean 〈w〉 = n〈x〉 and varianceV [w ] = nσ2.


Illustration: The central limit theorem

0

0.1

0.2

0.3

0.4

0.5

-3 -2 -1 0 1 2 3

GaussN=1

0

0.1

0.2

0.3

0.4

0.5

-3 -2 -1 0 1 2 3

N=2

0

0.1

0.2

0.3

0.4

0.5

-3 -2 -1 0 1 2 3

N=3

0

0.1

0.2

0.3

0.4

0.5

-3 -2 -1 0 1 2 3

N=10

The sum of uniformly distributed random variables and thestandard normal distribution.


Special probability densities

Uniform distribution: This probability distribution is constant inbetween the limits x = a and x = b:

f (x) =

{ 1b−a a ≤ x < b0 otherwise

Mean and variance:

〈x〉 = E [x ] =a + b

2V [x ] = σ2 =

(b − a)2

12


Gaussian distribution

The most important probability distribution - also called normaldistribution:

f (x) =1√2πσ

e−(x−µ)2

2σ2

The Gaussian distribution has two parameters, the mean µ andthe variance σ2. The probability distribution with mean µ = 0and variance σ2 = 1 is named standard normal distribution orshort N(0,1).The Gaussian distribution can be derived from the binomialdistribution for large values of n and r and similarly from thePoisson distribution for large values of Werte von µ.



∫ 1

−1dx N(0,1) = 0,6827 = (1− 0,3173)∫ 2

−2dx N(0,1) = 0,9545 = (1− 0,0455)∫ 3

−3dx N(0,1) = 0,9973 = (1− 0,0027)

FWHM: useful to estimate the standard deviation:

FWHM = 2σ√

2ln2 = 2,355σ



0 0.05 0.1

0.15 0.2

0.25 0.3

0 2 4 6 8 10 12 14 0

0.05

0.1

0.15

0.2

0 2 4 6 8 10 12 14

Left side: The binomial distribution for n = 10 and p = 0,6in comparison to the Gaussian distributionfor µ = np = 6 and σ =

√np(1− p) =

√2,4.

Right side: The Poisson distribution for µ = 6 and σ =√

6in comparison to the Gaussian distribution.


Documents

Numerical Methods for Data Analysis - uni-mainz. · PDF fileIntroduction Data analysis in nuclear and particle physics Observe events of a certain type Measure characteristics of each