Upload
vunhu
View
214
Download
0
Embed Size (px)
Citation preview
Numerical Methods for Data Analysis
Michael O. [email protected]
Bosen (Saar), August 29 - September 3, 2010
FundamentalsProbability distributionsExpectation values, error propagationParameter estimation
Regression analysisMaximum likelihoodLinear Regression
Advanced topics
Numerical Methods for Data Analysis
Some statistics books, papers, etc.
Volker Blobel und Erich Lohrmann: Statistische und numerischeMethoden der Datenanalyse, Teubner Verlag (1998)Siegmund Brandt: Datenanalyse, BI Wissenschaftsverlag (1999)Philip R. Bevington: Data Reduction and Error Analysis for thePhysical Sciences, McGraw-Hill (1969)Roger J. Barlow: Statistics, John Wiley & Sons (1993)Glen Cowan: Statistical Data Analysis, Oxford University Press(1998)Frederick James: Statistical Methods in Experimental Physics,2nd Edition, World Scientific, 2006
Wes Metzger’s lecture notes:www.hef.kun.nl/~wes/stat_course/statist.pdf
Glen Cowan’s lecture notes:www.pp.rhul.ac.uk/~cowan/stat_course.html
Particle Physics Booklet: http://pdg.lbl.gov/
Numerical Methods for Data Analysis
Introduction
Data analysis in nuclear and particle physics
Observe events of a certain type
Measure characteristics of each eventTheories predict distributions of these properties up to freeparametersSome tasks of data analysis:
Estimate (measure) the parameters;Quantify the uncertainty of the parameter estimates;Test the extent to which the predictions of a theory are inagreement with the data.
Numerical Methods for Data Analysis
Introduction
Philosophy of ScienceKarl R. Popper (* 28. Juli 1902 in Vienna, Austria;† 17. September 1994 in London, England) coined the termcritical rationalism. At the heart of his philosophy of science liesthe account of the logical asymmetry between verification andfalsifiability. Logik der Forschung, 1934.
−→Existence of a true valueof measured quantities and derived values.
Numerical Methods for Data Analysis
Theory of probability
Probability theory, mathematics:
−→ Kolmogorov axioms
Classical interpretation, frequentist probability:Pragmatical definition of probability:
p(E) = limN→∞
nN
n(E) = number of events EN = number of trials (experiments)Experiments have to be repeatable (in principle).Disadvantage: Strictly speaking one cannot makestatements on the probability of any true value. Only upperand lower limits are possible given a certain confidencelevel.
Numerical Methods for Data Analysis
Theory of probability
Probability theory, mathematicsClassical interpretation, frequentist probabilityBayesian statistics, subjective probability:Prior subjective assumptions enter into the calculation ofprobabilities of a hypotheses H.
p(H) = degree of belief that H is true
Metaphorically speaking: Probabilities are the ratio of the(maximum) wager and the anticipated prize in a bet.
Whatif there were 20 times more yellow cabs than green cabs?Would you still believe the witness?
Numerical Methods for Data Analysis
Theory of probability
Probability theory, mathematicsClassical interpretation, frequentist probabilityBayesian statistics, subjective probability:
Prior subjective assumptions enter into the calculation ofprobabilities of a hypotheses H.
Suppose there is a town with green and yellow taxicabs. Ina hit-and-run accident a man was hurt and a witness saw agreen cab.In court the lawer of the taxi company impeaches thecredibility of the witness, because of the lighting conditions.A test showed that under similar conditions 10% of thewitnesses confuse the color of the cabs.Would you believe the witness?
What if there were 20 times more yellow cabs than greencabs? Would you still believe the witness?
Numerical Methods for Data Analysis
Theory of probability
Probability theory, mathematicsClassical interpretation, frequentist probabilityBayesian statistics, subjective probability:
Prior subjective assumptions enter into the calculation ofprobabilities of a hypotheses H.
Suppose there is a town with green and yellow taxicabs. Ina hit-and-run accident a man was hurt and a witness saw agreen cab.In court the lawer of the taxi company impeaches thecredibility of the witness, because of the lighting conditions.A test showed that under similar conditions 10% of thewitnesses confuse the color of the cabs.Would you believe the witness?What if there were 20 times more yellow cabs than greencabs? Would you still believe the witness?
Numerical Methods for Data Analysis
Theory of probability
Probability theory, mathematicsClassical interpretation, frequentist probabilityBayesian statistics, subjective probability:Prior subjective assumptions enter into the calculation ofprobabilities of a hypotheses H.
taxicabs witness sees . . . statement is . . .200 yellow 180 × “yellow”
20 × “green” 20/29 = 69% wrong10 green 9 × “green” 9/29 = 31% true
1 × “yellow”
Disadvantage: Prior hypotheses influence the probability.Advantages for rare and one-time events, like noisy signalsor catastrophe modeling.
In this lecture we will focus on the classicalstatistics, e.g. error estimates have to be under-stood as confidence regions.
Numerical Methods for Data Analysis
Theory of probability
Probability theory, mathematicsClassical interpretation, frequentist probabilityBayesian statistics, subjective probability:Prior subjective assumptions enter into the calculation ofprobabilities of a hypotheses H.Disadvantage: Prior hypotheses influence the probability.Advantages for rare and one-time events, like noisy signalsor catastrophe modeling.
In this lecture we will focus on the classicalstatistics, e.g. error estimates have to be under-stood as confidence regions.
Numerical Methods for Data Analysis
Theory of probability
Probability theory, mathematicsClassical interpretation, frequentist probabilityBayesian statistics, subjective probability:Prior subjective assumptions enter into the calculation ofprobabilities of a hypotheses H.Disadvantage: Prior hypotheses influence the probability.Advantages for rare and one-time events, like noisy signalsor catastrophe modeling.
In this lecture we will focus on the classicalstatistics, e.g. error estimates have to be under-stood as confidence regions.
Numerical Methods for Data Analysis
Combining probabilities
Two kinds of events are given: A and B. The probability of A isp(A) (B: p(B)). Then the probability of A or B is:
p(AorB) = p(A) + p(B)− p(AandB)
If A and B are mutually exclusive then p(AandB) = 0Example: Drawing from a deck of German Skat cards.
p(Ace or spades) =432
+8
32− 1
32=
1132
Special case: B = A (A will NOT occur).
p(Aand A) = p(A) + p(A) = 1
Numerical Methods for Data Analysis
Combining probabilities
Joint probability of A and B occuring simultaniously:
p(AandB) = p(A) · p(B|A),
p(B|A) is called condional probability.If A and B are independent one gets p(B|A) = p(B),respectively
p(AandB) = p(A) · p(B)
Numerical Methods for Data Analysis
Death in the mountains
In a book on mountaineering achievements of ReinholdMessner one reads the following: “If you consider that theprobability of dying in a expedition to an eight-thousander is3,4%, then Messner had a probability of 3,4% · 29 = 99% to bekilled during his 29 expeditions.”
That may not be true. What if Messner sets off to a 30thexpedition?The probability to survive an expedition is obviously1− 0,034 = 0,966. If one assumes that the various expeditionsrepresent independent events, the probability of surviving all 29expeditions is: P = 0,96629 = 0,367.
Numerical Methods for Data Analysis
Death in the mountains
In a book on mountaineering achievements of ReinholdMessner one reads the following: “If you consider that theprobability of dying in a expedition to an eight-thousander is3,4%, then Messner had a probability of 3,4% · 29 = 99% to bekilled during his 29 expeditions.”That may not be true. What if Messner sets off to a 30thexpedition?
The probability to survive an expedition is obviously1− 0,034 = 0,966. If one assumes that the various expeditionsrepresent independent events, the probability of surviving all 29expeditions is: P = 0,96629 = 0,367.
Numerical Methods for Data Analysis
Death in the mountains
In a book on mountaineering achievements of ReinholdMessner one reads the following: “If you consider that theprobability of dying in a expedition to an eight-thousander is3,4%, then Messner had a probability of 3,4% · 29 = 99% to bekilled during his 29 expeditions.”That may not be true. What if Messner sets off to a 30thexpedition?The probability to survive an expedition is obviously1− 0,034 = 0,966. If one assumes that the various expeditionsrepresent independent events, the probability of surviving all 29expeditions is: P = 0,96629 = 0,367.
Numerical Methods for Data Analysis
Definitions
probability mass function (pmf) probability density function (pdf)of a measured value (=random variable)
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.1
0 5 10 15 20 25 30
f(n)
n
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.1
0 5 10 15 20 25 30
f(x)
x
f (n) discrete f (x) continuousNormalization:
f (n) ≥ 0∑
n
f (n) = 1 f (x) ≥ 0∫ ∞−∞
f (x) dx = 1
Probability:
p(n1 ≤ n ≤ n2) =
n2∑n1
f (n) p(x1 ≤ x ≤ x2) =
∫ x2
x1
f (x)dx
Numerical Methods for Data Analysis
Definitions
Cumulative distribution function (CDF):
F (x) =
∫ x
−∞f (x ′)dx ′, F (−∞) = 0, F (∞) = 1
Example:Decay time t of a radioactive nucleus with mean life time τ :
f (t) =1τ
e−t/τ F (t) = 1− e−t/τ
0
0.2
0.4
0.6
0.8
1
0 10 20 30 40 50
t/s
f(t)*12s F(t)
Numerical Methods for Data Analysis
Expectation values and moments
Mean: A random variable X takes on the values X1, X2, . . . , Xnwith probability p(Xi), then the expected value of X (“mean”) is
X = 〈X 〉 =n∑
i=1
Xi · p(Xi)
The expected value of an arbitrary funktion h(x) for acontinuous random variable is:
E [h(x)] =
∫ ∞−∞
h(x) · f (x)dx
The mean ist the expected value of x:
E [x ] = x =
∫ ∞−∞
x · f (x)dx
Numerical Methods for Data Analysis
Expectation values and moments
standard deviation = {mean (deviation from x)2}1/2
σ2 = (x − x)2 =
∫ ∞−∞
(x − x)2 · f (x)dx
=
∫ ∞−∞
(x2 − 2xx + x2) · f (x)dx = x2 − 2x x + x2 = x2 − x2
σ2 = Variance, σ = Standard deviationDiscrete distributions:
σ2 =1N
(∑x2 − (
∑x)2
N
)Attention: This is the definition of the variance! To get a biasfree estimation of the variance, 1
N will be replaced by 1N−1 .
Numerical Methods for Data Analysis
Expectation values and moments
Moments are the expected value of xn and of (x − 〈x〉)n. Theyare called nth algebraic moment µn and nth central moment µ′n,respectivly.Skewness v(x) is a measure of the asymmetry of theprobability distribution of a random variable x :
v =µ′3σ3 =
E [(x − E [x ])3]
σ3
Kurtosis is a measure of the ”peakedness” of the probabilitydistribution of a random variable x .
γ2 =µ′4σ4 − 3 =
E [(x − E [x ])4]
σ4 − 3
Numerical Methods for Data Analysis
Binomial distribution
The binomial distribution is the discrete probability distributionof the number of successes r in a sequence of n independentyes/no experiments, each of which yields success withprobability p (Bernoulli experiment).
P(r) =
(nr
)pr · (1− p)n−r
P(r) is normalized. Proof: Binomial theorem with q = 1− p.The mean of r is:
〈r〉 = E [r ] =n∑
r=0
rP(r)= np
The varianz σ2 is
V [r ] = E [(r − 〈r〉)2] =n∑
r=0
(r − 〈r〉)2P(r)= np(1− p)
Numerical Methods for Data Analysis
Poisson distribution
The Poisson distribution ist given by:
P(r) =µr e−µ
r !
The mean is:
〈r〉 = µ
The variance is:
V [r ] = σ2 = np = µ
0
0.1
0.2
0.3
0.4
0.5
0.6
0 2 4 6 8 10
µ = 0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0 2 4 6 8 10
µ = 1
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 2 4 6 8 10
µ = 2
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 2 4 6 8 10
µ = 4
Numerical Methods for Data Analysis
Law of large numbers
The law of large numbers (LLN) is a theorem that describes theresult of performing the same experiment a large number oftimes.According to the law, the average of the results obtained from alarge number of trials should be close to the expected value,and will tend to become closer as more trials are performed.We perform n independent experiments (Bernoulli trials) wherethe result j occurs nj times.
pj = E [hj ] = E [nj/n]
The variance of a Binomial distribution is:
V [hj ] = σ2(hj) = σ2(nj/n) =1n2 · σ
2(nj) =1n2 · npj(1− pj)
From the product pj(1− pj) which is ≤ 14 , we can deduce the
law of large numbers:
σ2(hj) < 1/n
Numerical Methods for Data Analysis
The central limit theorem
The central limit theorem (CLT) states conditions under whichthe mean of a sufficiently large number of independent randomvariables, each with finite mean and variance, will beapproximately normally distributed.Let xi be a sequence of n independent and identicallydistributed random variables each having finite values ofexpectation µ and variance σ2 > 0.In the limit n→∞ the random variable w =
∑ni=1 xi will be
normally distributed with mean 〈w〉 = n〈x〉 and varianceV [w ] = nσ2.
Numerical Methods for Data Analysis
Illustration: The central limit theorem
0
0.1
0.2
0.3
0.4
0.5
-3 -2 -1 0 1 2 3
GaussN=1
0
0.1
0.2
0.3
0.4
0.5
-3 -2 -1 0 1 2 3
N=2
0
0.1
0.2
0.3
0.4
0.5
-3 -2 -1 0 1 2 3
N=3
0
0.1
0.2
0.3
0.4
0.5
-3 -2 -1 0 1 2 3
N=10
The sum of uniformly distributed random variables and thestandard normal distribution.
Numerical Methods for Data Analysis
Special probability densities
Uniform distribution: This probability distribution is constant inbetween the limits x = a and x = b:
f (x) =
{ 1b−a a ≤ x < b0 otherwise
Mean and variance:
〈x〉 = E [x ] =a + b
2V [x ] = σ2 =
(b − a)2
12
Numerical Methods for Data Analysis
Gaussian distribution
The most important probability distribution - also called normaldistribution:
f (x) =1√2πσ
e−(x−µ)2
2σ2
The Gaussian distribution has two parameters, the mean µ andthe variance σ2. The probability distribution with mean µ = 0and variance σ2 = 1 is named standard normal distribution orshort N(0,1).The Gaussian distribution can be derived from the binomialdistribution for large values of n and r and similarly from thePoisson distribution for large values of Werte von µ.
Numerical Methods for Data Analysis
Gaussian distribution
∫ 1
−1dx N(0,1) = 0,6827 = (1− 0,3173)∫ 2
−2dx N(0,1) = 0,9545 = (1− 0,0455)∫ 3
−3dx N(0,1) = 0,9973 = (1− 0,0027)
FWHM: useful to estimate the standard deviation:
FWHM = 2σ√
2ln2 = 2,355σ
Numerical Methods for Data Analysis
Gaussian distribution
0 0.05 0.1
0.15 0.2
0.25 0.3
0 2 4 6 8 10 12 14 0
0.05
0.1
0.15
0.2
0 2 4 6 8 10 12 14
Left side: The binomial distribution for n = 10 and p = 0,6in comparison to the Gaussian distributionfor µ = np = 6 and σ =
√np(1− p) =
√2,4.
Right side: The Poisson distribution for µ = 6 and σ =√
6in comparison to the Gaussian distribution.
Numerical Methods for Data Analysis