Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
P598AEM Lecture Notes 09 1
The Multivariate Gaussian/Normal Distribution:
Given N centered (i.e. ˆ 0ix = ) random variables 1 2, , ... Nx x x :
x ≡x1
xN
⎛
⎝
⎜ ⎜
⎞
⎠
⎟ ⎟ with:
ˆ x ≡ E x[ ]=
ˆ x 1
ˆ x N
⎛
⎝
⎜ ⎜
⎞
⎠
⎟ ⎟ = 0 (= the zero {aka the null} vector)
Suppose the ix are all Gaussian/normally-distributed and independent. Then their joint P.D.F. will be:
( ) ( ) ( ) ( )
( )( )
2 22 2 2 21 1 2 2
2 2 2 2 2 21 1 2 2
1 2 1 2
22 2
1 2
2 2 2
21 2
, , ... 1 1 1
2 2 2 1 1
2
N N
N N
N N
xx x
N
x x x
NN
f x x x f x f x f x
e e e
e
σσ σ
σ σ σ
π σ π σ π σ
σ σ σπ
−− −
− + + +
= ⋅ ⋅ ⋅
= ⋅ ⋅ ⋅
=⋅ ⋅ ⋅
…
…
…
…
( )
2 2
1
2
2
1
1 1 2
N
i ii
x
N N
ii
eσ
π σ
=
⎛ ⎞⎜ ⎟−⎜ ⎟⎝ ⎠
=
∑=
∏
As we discussed previously in P598AEM Lect. Notes 6 (p. 15), we can write the argument of the exponential function as the matrix product 11
2ˆT
xx V x− where 1 ̂xV − is the inverse of the x-basis
variance matrix ˆ
xV :
1
2
2
21
2
1 0 0
0 1 0ˆ
0 0 1N
x
xx
x
V
σ
σ
σ
−
⎛ ⎞⎜ ⎟⎜ ⎟≡ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
and:
1
2
2
2
2
0 0
0 0ˆ [ ]
0 0N
x
xTx
here
x
V E x x
σ
σ
σ
⎛ ⎞⎜ ⎟⎜ ⎟≡ ≡ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
with: 1 1
ˆ ˆ ˆ ˆx x x xV V V V− −= =
Note further that: 1
1 2
1
1 1 1ˆdet ˆdet
xNN xi
i
VVσ σ σ σ
−
=
= = =⋅ ⋅ ⋅ ∏…
Thus, for independent 1 2, , ... Nx x x :
( ) ( )( ) ( )
2 211
1 2
2 ˆ1 2 2 2
1
1 1 1 1, , ... ˆ2 2 det
N
i i Txi
xx V x
N N N N
xii
f x f x x x e eV
σ
π πσ
−=
⎛ ⎞⎜ ⎟−⎜ ⎟ −⎝ ⎠
=
∑≡ = =
∏
Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
P598AEM Lecture Notes 09 2
From discussions in previous P598AEM lectures (and HW), we know that if we start with independent random variables 1 2, , ... Nx x x and then make a change to a new set of random variables 1 2, , ... Ny y y , the new random variables are (usually) dependent. We can also “go the other way”:
Suppose that the 1 2, , ... Ny y y are N centered (i.e. ˆ 0iy = ) random variables, whose covariance matrix is:
( ) ( )( ) ( )
( ) ( )
1
2
21 2 1
22 1 2
21 2
cov , cov ,
cov , cov ,ˆ [ ]
cov , cov ,N
y N
y NTy
here
N N y
y y y y
y y y yV E y y
y y y y
σ
σ
σ
⎛ ⎞⎜ ⎟⎜ ⎟≡ ≡ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
Since ˆ
yV is real and symmetric , it can be diagonalized by an Orthogonal Transformation C ,
i.e. there exists an orthogonal matrix C for which ˆ T
yCV C is diagonal.
In that case, we can also define: x C y= . Then the diagonal x-basis variance matrix is:
ˆ ˆ[ ] [ { } ] [ ] [ ] T T T T T T T
x yV E x x E C y C y E C y y C C E y y C CV C≡ = = = =
Thus, we see that the random variables x are indeed independent. Conclusion: Since one can go from any set of N y-basis random variables to N independent x-basis random variables by means of an orthogonal transformation, the reverse (i.e. the inverse transformation) must also be true. That is, starting with a set of N independent x-basis random variables we can go to any set of N non-independent y-basis random variables via some orthogonal transformation (matrix) C (e.g. an N-dimensional rotation of basis vectors).
Since C is an orthogonal matrix, then: 1 TC C− ≡ and therefore if x C y= then: 1 Ty C x C x−= = .
Proof: x C y= ⇒ 1 1 C x C C y y− −= = ⇒ 1 Ty C x C x−= = Q.E.D.
Let us now return to the P.D.F. for independent, centered 1 2, , ... Nx x x :
( )( )
11 2
ˆ
2
1 1ˆ2 det
Txx V x
N
x
f x eVπ
−−=
Let us make an orthogonal change of random variables and find the P.D.F. for y .
Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
P598AEM Lecture Notes 09 3
Setting x C y= (and ( ) TT T Tx C y y C∴ = = ), then:
1 1
ˆ ˆT T T
x xx V x y C V C y− −= but: 1 1
ˆ ˆTx yC V C V− −=
Thus: 1 1 1 ˆ ˆ ˆ T T T T
x x yx V x y C V C y y V y− − −= = .
From above, the diagonal x-basis variance matrix is: ˆ ˆ T
x yV CV C=
Multiplying this relation on the LHS by TC and on the RHS by C , we get:
ˆ ˆT T T
x yC V C C CV C C= and since 1 1T TC C C C CC CC− −= = = = (the unit matrix)
Then: ˆ ˆT
y xV C V C= , the y-basis covariance matrix.
Check – does 1 ˆ ˆ
y yV V − = ? (n.b. we already showed that 1 1
ˆ ˆ ˆ ˆx x x xV V V V− −= = above)
1 1 1 1
ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆT T T T T T
y y x x x x x xV V C V CC V C C V V C C V V C C C C C− − − −
= ==
= = = = = =
Yes!
Using det det detAB A B= ⋅ :
( ) ( ) ( ) ( ) ( ) ( ) ˆ ˆ ˆ ˆdet det det det det det detT T T
y x x xV C V C C V C C V C= = ⋅ = ⋅ ⋅
But: ( ) ( ) ( ) ( )1det det det det 1TC C C C−⋅ = ⋅ = (since C is orthogonal)
Thus: ( ) ˆ ˆdet dety xV V=
We also need the Jacobian (to derive the P.D.F. for y ). Since 1 Ty C x C x−= = :
( )
1 1 1
1 2
2 2 211 2
1 21 2
1 2
, , ... det det, , ...
N
TNN
N
N N N
N
y y yx x xy y y
y y y x x xJ y x J C Cx x x
y y yx x x
−
∂ ∂ ∂∂ ∂ ∂
∂ ∂ ∂⎛ ⎞
∂ ∂ ∂= = = =⎜ ⎟⎝ ⎠
∂ ∂ ∂∂ ∂ ∂
⇒ 1det det 1TC C−= = ± (since C is orthogonal) and thus | J | = 1.
Thus, when we change variables from the x′s to the y′s via an orthogonal transformation, C
( ) ( )h y dy f x dx= ⇒ ( ) ( )( )
( ) ( ) ( ) ( )1 1det detT
f x f x f x f xh y f x
C CJ y x −= = = = =
±
n.b. The determinant
of an orthogonal matrix C is det(C) = ±1.
Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
P598AEM Lecture Notes 09 4
Then the x-basis P.D.F. ( ) ( )( )
11 2
ˆ1 2 2
1 1, , ... ˆ2 det
Txx V x
N N
x
f x f x x x eVπ
−−≡ =
Becomes the y-basis P.D.F. ( ) ( )( )
11 2ˆ
1 2 2
1 1, , ... ˆ2 det
Tyy V y
N N
y
h y h y y y eVπ
−−≡ =
We can (of course) generalize these results to N non-centered x, y-random variables as well:
( ) ( )( )
( ) ( )
( )1 11 1
2 2ˆ ˆ ˆ ˆˆ ˆ
1 2 2 2
1 1 1 1, , ... ˆ ˆ2 2 det det
T Tx x x xx x V x x R V R
N N N
x x
f x f x x x e eV Vπ π
− −− − − −≡ = =
( ) ( )( )
( ) ( )
( )1 11 1
2 2ˆ ˆ ˆ ˆˆ ˆ
1 2 2 2
1 1 1 1, , ... ˆ ˆ2 2 det det
T Ty y y yy y V y y R V R
N N N
y y
h y h y y y e eV Vπ π
− −− − − −≡ = =
where the residual matrices are defined as: ˆ ˆ( )xR x x≡ − and
ˆ ˆ( )yR y y≡ −
and the covariance matrices are defined as: ˆ ˆ ˆˆ ˆ[( ) ( ) ] [ ]T T
x x xV E x x x x E R R≡ − − =
and: ˆ ˆ ˆˆ ˆ[( ) ( ) ] [ ]T T
y y yV E y y y y E R R≡ − − = We shall encounter the argument of the exponential function of the Gaussian/normal P.D.F.
( ) ( )1 1
ˆ ˆ ˆ ˆˆ ˆT T
y y y yy y V y y R V R− −− − = again in the (near) future, when we develop the formalism (“Hypothesis Testing”) to ask in a rigorous, quantitative way whether N measurements of random variables 1 2, , ... Ny y y are consistent (or not) with a set of given/specified expectation
values 1 2ˆ ˆ ˆ, , ... Ny y y and a given /specified variance matrix ˆ
yV . In particular, we will define the “Chi-Squared” parameter as:
( ) ( )2 1 1
ˆ ˆ ˆ ˆˆ ˆT T
y y y yy y V y y R V Rχ − −≡ − − =
Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
P598AEM Lecture Notes 09 5
The Binomial Probability Distribution: The Binomial Distribution – n.b. a discrete distribution – is relevant for/useful in situations where a fixed number N of independent experiment trials are carried out, in which each individual trial has only two possible outcomes – e.g. flipping a coin: A , heads ≡ success (with constant, known probability p) or A , tails ≡ failure (with constant known probability q ≡ 1 – p).
The probability ( ) ; ,P n N p of obtaining n successes in N independent, individual trials is given by the Binomial Distribution:
( ) ( ) ( ) ( ) ( )! ! ; , 1 1! ! ! !
N n N nn N n n nNN NP n N p p q p p p pnn N n n N n
− −− ⎛ ⎞= = − = −⎜ ⎟− − ⎝ ⎠
with Binomial Coefficient: ( )!
! !N Nn n N n
⎛ ⎞≡⎜ ⎟ −⎝ ⎠
The np term is the success probability associated with obtaining n successes in N independent, individual trials.
The ( )1 N nN nq p −− = − term is the failure probability associated with the remaining N – n unsuccessful / failed trials.
Note also that the joint probability ( )1 N nnp p −⋅ − corresponds to only one possible ordering / sequence of n successes and (N – n) failures in N independent trials, since
N
ppqqp and N
pqqpp , etc., are equivalent contributions to ( ); ,P n N p .
The factorial term ( )! ! !N n N n− thus gives the total # of all possible unique/distinct permutations of the ordering/sequence of n successes and (N – n) failures in N independent trials. We thus see that n is a random variable associated with a fixed number N of independent trials, and ranges (in discrete, integer steps) from 0 n N≤ ≤ .
In the two figures below, the LHS figure shows ( ); , . P n N p vs n for three examples of the Binomial Distribution, two with N = 20 and p = 0.5 & 0.7, and one with N = 40 and p = 0.5. The RHS figure shows the corresponding Cumulative Distribution for each of these three:
n n
Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
P598AEM Lecture Notes 09 6
The expectation/mean value ˆ [ ] nn E n μ≡ ≡ of the discrete random variable n associated with the Binomial Distribution with p known is obtained by summing ( ) ; ,n P n N p⋅ over all allowed values of n successes:
( ) ( ) ( ) ( )0 0 0
! !ˆ [ ] ; , 1! ! ! !
N N NN nn N n n
nn n n
N Nn E n n P n N p n p q n p p Npn N n n N n
μ −−
= = =
≡ ≡ = ⋅ = ⋅ = ⋅ − =− −∑ ∑ ∑
The variance ( ) ( )22 ˆvar [ ]nn E n nσ≡ = − of the discrete random variable n associated with the Binomial Distribution with p known is:
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
2 22 2 2 2
0
22
0
ˆvar [( ) ] [ ] [ ] ; ,
! 1 1! !
N
nn
NN nn
n
n E n n E n E n n P n N p Np
Nn p p Np Np pn N n
σ=
−
=
≡ = − = − = ⋅ −
= ⋅ − − = −−
∑
∑
How are these results obtained? Let us define a random variable in for the ith independent coin-toss experiment:
1 if ( / ) occurs (with probability )0 if ( / ) occurs (with probability 1 )i
A success heads known pn
A failure tails known q p⎧
= ⎨ = −⎩
The expectation value of in for one experimental trial (coin toss) with p apriori known is the sum over the (two) possible values of in , weighted by the probabilities of each of the two possible values of in :
( )ˆ[ ] 1 0 1 0 1i iE n n p q p p p= = ⋅ + ⋅ = ⋅ + ⋅ − = The number of times heads has occurred in a given set of N independent coin-toss experiments is:
1 21
N
N ii
n n n n n=
= + + + = ∑ and thus: 1 21 1
ˆ[ ] [ ] [ ]N N
N i ii i
E n n E n n n E n E n Np= =
⎡ ⎤= = + + + = = =⎢ ⎥⎣ ⎦∑ ∑ .
We can also calculate the variance of 1 2 Nn n n n= + + + in a similar manner:
( ) { }22 2 21 2ˆvar [( ) ] [( [ ]) ] [ ( ) ( ) ( ) ]n Nn E n n E n E n E n p n p n pσ≡ = − = − = − + − + + −
Defining: [ ]i i i in p n E nξ ≡ − = − , then: 1 2 N n Npξ ξ ξ ξ= + + + = − and, remembering that the
in (and hence the iξ ) are independent, also noting that: [ ] [ ] [ ] 0i i iE E n p E n p p pξ = − = − = − = :
( ) { }22 2 2 2 2 21 2 1 2 1
1var [ ] [ ] [ ] [ ]
N
n N N ii
n E E E N Eσ ξ ξ ξ ξ ξ ξ ξ ξ=
≡ = + + + = + + + = =∑
Now the i in pξ ≡ − can take on a finite set of values:
1 if ( / ) occurs (with probability )0 if ( / ) occurs (with probability 1 )i i
p q A success heads known pn p
p p A failure tails known q pξ
− =⎧≡ − = ⎨ − = − = −⎩
Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
P598AEM Lecture Notes 09 7
Thus, the expectation value of 2iξ with p apriori known, for one experimental trial (coin toss)
is the sum over the (two) possible values of 2iξ , weighted by the probabilities of each of the two
possible values of iξ :
( ) ( ) ( )22 2
1
[ ] 1iE q p p q qp p q qp p pξ=
= ⋅ + − ⋅ = + = = −
Hence, the variance associated with the Binomial Distribution ( ); ,P n N p with p apriori known is:
( ) ( )2 2
1 1
var [ ] 1N N
n ii i
n E qp Nqp Np pσ ξ= =
≡ = = = = −∑ ∑ and thus: ( )2 1n n Nqp Np pσ σ= = = −
The “frequency of success” (aka success fraction) with p apriori known is defined as: Nf n N≡ .
Then: [ ]ˆ[ ]N NE n NpE f f p
N p= = = = and:
( )22
2 2
1N
nf
p pNpq pqN N N Nσσ
−= = = = ⇒ ( )2
2
1N
nf
p ppqN N Nσσ
−= = =
Summary of Results for the Binomial Distribution − when p is apriori known:
In a situation in which either A (= success/heads) or A (= failure/tails) occurs, then given p = apriori known probability for A to occur, thus q = 1 − p = known probability for A to occur in N independent trials, the probability ( ); ,P n N p of obtaining n successes in N independent trials is given by:
( ) ( ) ( ) ( ) ( )! ! ; , 1 1! ! ! !
N n N nn N n n nNN NP n N p p q p p p pnn N n n N n
− −− ⎛ ⎞= = − = −⎜ ⎟− − ⎝ ⎠
with Binomial Coefficient: ( )!
! !N Nn n N n
⎛ ⎞≡⎜ ⎟ −⎝ ⎠
In the table below, we summarize the some of the most frequently-used statistical parameters associated with the Binomial Distribution:
Random Variable n = Number of Successes fN = Frequency of Success Expectation Value: ˆ[ ]E n n Np= = [ ]NE f p= Variance: ( )2 1n Np pσ = − ( )2 1
Nfp p Nσ = −
Standard Deviation: ( )1n Np pσ = − ( )1Nf
p p Nσ = −
Skewness: ( )( )1
1 2
1
p
Np pγ
−=
− ( ) ( )
( )1 2
1 2 11
N
pf
NNp pγ
−=
−
Excess Kurtosis: ( )( )2
1 6 (1 )1
p pNp p
γ− −
=−⎡ ⎤⎣ ⎦
( ) ( )( )2 2
1 6 (1 ) 11N
p pf
NNp pγ
− −=
−⎡ ⎤⎣ ⎦
Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
P598AEM Lecture Notes 09 8
A Binomial Statistics Example: A “parent nucleus” decays to either “daughter nucleus” A or “daughter nucleus” B.
Suppose the probability to decay to A is known to be quite small: 35 10Ap −× , e.g. because the same experiment previously measured the probability to decay to B, ( )0.995 1B A Ap q p= = − .
Suppose further that we wish to directly measure Ap accurately, say to 1% (because 0.995B Ap q= is already known e.g. to 0.005% − i.e. we wish to explicitly check for self-
consistency – does ?A Bp q= Perhaps there are as-yet undiscovered decay modes…).
How many radioactive decays AN of parent nuclei to daughter A must we observe in order to achieve this accuracy on Ap ?
Here we are trying to measure ANf , whose expectation value is Ap , to an accuracy 0.01A
NAf
pσ ≤ .
∴ ( )10.01A
N
A AA AAf
A A
p pp q pN N
σ−
= = ≤
or: ( )
( )( ) ( ) ( )
2
2 2 2 2
10.01 0.01 0.01 0.01
A AA A A A AA
A A A A
p pp q p p pNp p p p
− −≥ = = ≅ since 35 10 1Ap −×
∴ AA
pN ≥
( )2 20.01 Ap ( )
4 77 6
2 3
1 10 10 0.2 10 2 105 10 50.01 Ap −= = = = × = ×
× decays.
Since 50.01 5 10AN
Afpσ −≤ × and 5 55 10 5 10B
NBf
pσ − −≤ × × , and since the same experiment
measured both Ap (for A) and Bp (for B) with their accompanying uncertainties, then an explicit consistency check is to see if experimentally ( )1 ?A Bp p+ = Or, as a null-test:
( ) ( )1 0 ?A Bp pΔ ≡ + − = The 1-sigma uncertainty on Δ is 2 2 52 5 10A BN Nf f
σ σ σ −Δ = + ⋅ × .
For experimental situations where p is small , then 1 1q p= − , hence: Nf
p Nσ and thus the fractional uncertainty associated with the frequency of success, Nf in the small-p regime is:
1 1 1ˆ ˆ
Nf
N
pp N Np nf
σ= =
In other words, the fractional uncertainty of the frequency of success ˆ ˆNf n N≡ equals 1 over the square root of the number of successes, n̂ .
Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
P598AEM Lecture Notes 09 9
In the small-p regime, the standard deviation associated with the number of successes, n for N independent trials is: ( ) ˆ1n Npq Np p Np nσ = = − ≅ = . This is often called “the statistical uncertainty on the number of successes” or, for short: “the uncertainty on the number of events”.
Thus, the statistical uncertainty on the number of events nσ equals the square root of the number of observed events, n̂ . The above expressions are valid only in the small-p regime, i.e. when p, the probability that the given event will occur, is small ( )1p . Since from N trials we have a sample of n events, and p n N= , we see that p is “small” only if N is “large”. We keep these ideas in mind as we look at the Poisson Distribution... shortly… What do we do in a situation involving the experimental measurement of a known-to-be Binomial-type process – such as the radioactive decays of nuclei, when p is apriori unknown? Experimentally, we would e.g. observe/measure (a large number) n (successful) independent decays of N nuclei ( = “# trials”, here in this example) in some (large) time interval tΔ . An unbiased estimate of the decay (= success) probability per nuclei, p is the sample mean: p n N= ; it is an estimator of (the true, but apriori unknown) p, which (from the Law of Large
Numbers), in the limit of N → ∞ (or equivalently n → ∞ {because tΔ → ∞ }), ˆlimN
p p→∞
= , the true p. The corresponding unbiased estimate of the survival (= “failure”) probability per nuclei is
( )1 1q p n N= − = − .
The probability ( ) ; ,P n N p of obtaining n successes in N independent trials when (the true) p (and hence true q) are unknown is {also} given by the following Binomial Distribution:
( ) ( ) ( ) ( ) ( )! ! ; , 1 1! ! ! !
N n N nn N n n nNN NP n N p p q p p p pnn N n n N n
− −− ⎛ ⎞= = − = −⎜ ⎟− − ⎝ ⎠
The sample mean of this Binomial Distribution is:
( ) ( )( )
( ) ( )0 0 0
! ![ ] ; , 1! ! ! !
N N NN kN kk k
nk k k
N Nn E n k P k N p k p q k p p N pk N k k N k
μ −−
= = =
≡ ≡ = ⋅ = ⋅ = ⋅ − =− −∑ ∑ ∑
The (unbiased) variance of this Binomial Distribution (= an estimator of the true variance,
( )2 1n Np pσ = − ) when p is apriori unknown, is:
( ) ( )2 1 1
1Ns N p p N p p
N≡ − ≅ −⎡ ⎤⎣ ⎦−
for N → ∞
Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
P598AEM Lecture Notes 09 10
The Multinomial Probability Distribution: The generalization of the Binomial Probability Distribution in which there are only two outcomes (digital 1/0 - yes/no - success/failure) per trial to multiple possible outcomes per trial (such as rolling a die – with 6 possible outcomes, or pulling a card from a deck of 52 cards, etc.) is the Multinomial Probability Distribution.
If there are m possible outcomes per trial with corresponding known probability ip of obtaining the ith possible outcome per trial, since one of the m possible outcomes per trial must
be realized, we have the constraint relation: 1
1m
ii
p=
=∑ .
If N total trials are carried out, each individual trial will yield one of the m possible outcomes. The probability for a particular/specific sequence of outcomes, e.g. outcome i on the 1st trial, outcome j on the 2nd trial, etc. is the product of the N corresponding probabilities
i j kp p p⋅ ⋅ ⋅… , resulting in 1n outcomes of the 1st type, 2n outcomes of the 2nd type, etc. The number of such sequences in N total trials that result in 1n outcomes of the 1st type, 2n outcomes of the 2nd type, etc. is the multinomial factor:
1 1
!! ! !m
Nn n n…
If we do not care about the exact sequence/order of the m possible outcomes – just the total numbers for the outcomes for each type ( ) 1:in i m= in N total trials, then the joint probability for 1n outcomes of the 1st type, 2n outcomes of the 2nd type, etc. in N total trials is the Multinomial Probability Distribution:
( ) 1 2 1 21 2 1 2 1 2 1 2
11 21 2
!, , , ; , , , , !, , ,! ! ! !
im m
nmn nn n n n i
m m m mimm i
multinomial coefficient
N pNP n n n N p p p p p p p p p Nn n nn n n n=
⎛ ⎞= = =⎜ ⎟
⎝ ⎠∏… … … …
……
Since the Multinomial Distribution is associated with m possible outcomes, a complete/full graph/plot of ( )1 2 1 2, , , ; , , , , m mP n n n N p p p… … can only be done in m dimensions… Taking projections of it onto e.g the in axis, the expectation value/mean of the Multinomial Distribution, viewed from this in variable, when the ip are apriori known is:
ˆ[ ]i i iE n n Np= = .
The variance and covariance (!) of the Multinomial Probability Distribution, when the ip are apriori known are:
( ) ( )2 2var [( [ ]) ] 1ii n i i i in E n E n Np pσ= = − = −
and:
( ) ( )cov , [( [ ])( [ ])] i j i i j j i jn n E n E n n E n Np p i j= − − = − ≠
Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
P598AEM Lecture Notes 09 11
If the m possible outcomes associated with the Multinomial Probability Distribution are grouped together into only two possible categories, e.g. { }1n (= “success”) and { }2 3, , , mn n n…
(= “failure”) then this situation reduces to a Binomial process with ( ) ( )1 1 2 31 mq p p p p≡ − = + + + :
( ) ( ) ( ) ( ) 11 1 11 1 1 1 1
1 1 1 1
! !; , 1! ! ! !
N nn N n nN NP n N p p q p pn N n n N n
−−= ⋅ = ⋅ −− −
If the m possible outcomes associated with the Multinomial Probability Distribution are grouped together into e.g. three possible categories, e.g. { }1n , { }2n and everything else
{ }3 4, , , mn n n… the joint probability for 1n outcomes of the 1st type, 2n outcomes of the 2nd type, and 1 1N n n− − outcomes for everything else in N total trials is:
( ) ( ) ( ) 1 21 21 2 1 2 1 2 1 2
1 2 1 2
!, ; , , 1! ! !
N n nn nNP n n N p p p p p pn n N n n
− −= − −− −
As another example of a Multinomial Probability Distribution , consider the probability for obtaining a particular/specific histogram associated with N independent measurements of an (arbitrary) random variable ix .
The histogram of the random variable ix has m bins. For the N independent measurements of the random variable ix , there will be 1n entries in the 1st bin, 2n entries in the 2nd bin, etc. Since the covariance of the Multinomial Probability Distribution is negative:
( ) ( )cov , i j i jn n Np p i j= − ≠ This tells us/physically means that an upward (downward) fluctuation in the number of entries
in in the ith histogram bin of the random variable x will be positively (negatively) correlated with a downward (upward) fluctuation, respectively, in the number of entries jn of the jth histogram bin. As an experimenter, one could become (explicitly) aware of this fact e.g. if the entire / complete experiment, consisting of a set of N measurements, is repeated many times (e.g. 106 complete experiments). One can also gain some insights into such bin-correlation effects e.g. by histogramming the original data with coarser bin size (i.e. fewer bins) or e.g. using a histogram that has the same number of bins as the original histogram, but systematically shifting the bin boundaries e.g. by integer multiples {1:9} of 1/10th of the bin width. For a single experiment consisting of N measurements, if one is attempting to extract physical meaning e.g. from a particular/specific, finite-small subset of adjacent/contiguous histogram bins in the histogram, it is very well worth knowing/quantitatively understanding what the expected level of fluctuations in the entries of the histogram bins should be, due to such correlations….
Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
P598AEM Lecture Notes 09 12
e.g. Do you see any interesting unaccounted-for “bumps” in the inclusive μ μ+ − invariant mass spectrums, courtesy of the ATLAS and CMS experiments at the LHC:
Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
P598AEM Lecture Notes 09 13