13
Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede P598AEM Lecture Notes 09 1 The Multivariate Gaussian/Normal Distribution : Given N centered (i.e. ˆ 0 i x = ) random variables 1 2 , , ... N x x x : x x 1 # x N with: ˆ x Ex [] = ˆ x 1 # ˆ x N = 0 (= the zero {aka the null} vector) Suppose the i x are all Gaussian/normally-distributed and independent. Then their joint P.D.F. will be: ( ) ( ) ( ) ( ) ( ) ( ) 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 1 1 2 2 1 2 1 2 2 2 2 1 2 2 2 2 2 1 2 , , ... 1 1 1 2 2 2 1 1 2 N N N N N N x x x N x x x N N f x x x f x f x f x e e e e σ σ σ σ σ σ πσ πσ πσ σ σ σ π + + + = = = ( ) 2 2 1 2 2 1 1 1 2 N i i i x N N i i e σ π σ = = = As we discussed previously in P598AEM Lect. Notes 6 (p. 15), we can write the argument of the exponential function as the matrix product 1 1 2 ˆ T x x V x where 1 ˆ x V is the inverse of the x-basis variance matrix ˆ x V : 1 2 2 2 1 2 1 0 0 0 1 0 ˆ 0 0 1 N x x x x V σ σ σ " " # # % # " and: 1 2 2 2 2 0 0 0 0 ˆ [ ] 0 0 N x x T x here x V Exx σ σ σ " " # # % # " with: 1 1 ˆˆ ˆ ˆ x x x x VV V V = = Note further that: 1 1 2 1 1 1 1 ˆ det ˆ det x N N x i i V V σ σ σ σ = = = = Thus, for independent 1 2 , , ... N x x x : ( ) ( ) ( ) ( ) 2 2 1 1 1 2 2 ˆ 1 2 2 2 1 1 1 1 1 , , ... ˆ 2 2 det N i i T x i x x V x N N N N x i i f x f x x x e e V σ π π σ = = = =

22 22 () fileFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede P598AEM Lecture Notes 09 1

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 22 22 () fileFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede P598AEM Lecture Notes 09 1

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

P598AEM Lecture Notes 09 1

The Multivariate Gaussian/Normal Distribution:

Given N centered (i.e. ˆ 0ix = ) random variables 1 2, , ... Nx x x :

x ≡x1

xN

⎜ ⎜

⎟ ⎟ with:

ˆ x ≡ E x[ ]=

ˆ x 1

ˆ x N

⎜ ⎜

⎟ ⎟ = 0 (= the zero {aka the null} vector)

Suppose the ix are all Gaussian/normally-distributed and independent. Then their joint P.D.F. will be:

( ) ( ) ( ) ( )

( )( )

2 22 2 2 21 1 2 2

2 2 2 2 2 21 1 2 2

1 2 1 2

22 2

1 2

2 2 2

21 2

, , ... 1 1 1

2 2 2 1 1

2

N N

N N

N N

xx x

N

x x x

NN

f x x x f x f x f x

e e e

e

σσ σ

σ σ σ

π σ π σ π σ

σ σ σπ

−− −

− + + +

= ⋅ ⋅ ⋅

= ⋅ ⋅ ⋅

=⋅ ⋅ ⋅

( )

2 2

1

2

2

1

1 1 2

N

i ii

x

N N

ii

π σ

=

⎛ ⎞⎜ ⎟−⎜ ⎟⎝ ⎠

=

∑=

As we discussed previously in P598AEM Lect. Notes 6 (p. 15), we can write the argument of the exponential function as the matrix product 11

2ˆT

xx V x− where 1 ̂xV − is the inverse of the x-basis

variance matrix ˆ

xV :

1

2

2

21

2

1 0 0

0 1 0ˆ

0 0 1N

x

xx

x

V

σ

σ

σ

⎛ ⎞⎜ ⎟⎜ ⎟≡ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

and:

1

2

2

2

2

0 0

0 0ˆ [ ]

0 0N

x

xTx

here

x

V E x x

σ

σ

σ

⎛ ⎞⎜ ⎟⎜ ⎟≡ ≡ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

with: 1 1

ˆ ˆ ˆ ˆx x x xV V V V− −= =

Note further that: 1

1 2

1

1 1 1ˆdet ˆdet

xNN xi

i

VVσ σ σ σ

=

= = =⋅ ⋅ ⋅ ∏…

Thus, for independent 1 2, , ... Nx x x :

( ) ( )( ) ( )

2 211

1 2

2 ˆ1 2 2 2

1

1 1 1 1, , ... ˆ2 2 det

N

i i Txi

xx V x

N N N N

xii

f x f x x x e eV

σ

π πσ

−=

⎛ ⎞⎜ ⎟−⎜ ⎟ −⎝ ⎠

=

∑≡ = =

Page 2: 22 22 () fileFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede P598AEM Lecture Notes 09 1

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

P598AEM Lecture Notes 09 2

From discussions in previous P598AEM lectures (and HW), we know that if we start with independent random variables 1 2, , ... Nx x x and then make a change to a new set of random variables 1 2, , ... Ny y y , the new random variables are (usually) dependent. We can also “go the other way”:

Suppose that the 1 2, , ... Ny y y are N centered (i.e. ˆ 0iy = ) random variables, whose covariance matrix is:

( ) ( )( ) ( )

( ) ( )

1

2

21 2 1

22 1 2

21 2

cov , cov ,

cov , cov ,ˆ [ ]

cov , cov ,N

y N

y NTy

here

N N y

y y y y

y y y yV E y y

y y y y

σ

σ

σ

⎛ ⎞⎜ ⎟⎜ ⎟≡ ≡ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

Since ˆ

yV is real and symmetric , it can be diagonalized by an Orthogonal Transformation C ,

i.e. there exists an orthogonal matrix C for which ˆ T

yCV C is diagonal.

In that case, we can also define: x C y= . Then the diagonal x-basis variance matrix is:

ˆ ˆ[ ] [ { } ] [ ] [ ] T T T T T T T

x yV E x x E C y C y E C y y C C E y y C CV C≡ = = = =

Thus, we see that the random variables x are indeed independent. Conclusion: Since one can go from any set of N y-basis random variables to N independent x-basis random variables by means of an orthogonal transformation, the reverse (i.e. the inverse transformation) must also be true. That is, starting with a set of N independent x-basis random variables we can go to any set of N non-independent y-basis random variables via some orthogonal transformation (matrix) C (e.g. an N-dimensional rotation of basis vectors).

Since C is an orthogonal matrix, then: 1 TC C− ≡ and therefore if x C y= then: 1 Ty C x C x−= = .

Proof: x C y= ⇒ 1 1 C x C C y y− −= = ⇒ 1 Ty C x C x−= = Q.E.D.

Let us now return to the P.D.F. for independent, centered 1 2, , ... Nx x x :

( )( )

11 2

ˆ

2

1 1ˆ2 det

Txx V x

N

x

f x eVπ

−−=

Let us make an orthogonal change of random variables and find the P.D.F. for y .

Page 3: 22 22 () fileFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede P598AEM Lecture Notes 09 1

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

P598AEM Lecture Notes 09 3

Setting x C y= (and ( ) TT T Tx C y y C∴ = = ), then:

1 1

ˆ ˆT T T

x xx V x y C V C y− −= but: 1 1

ˆ ˆTx yC V C V− −=

Thus: 1 1 1 ˆ ˆ ˆ T T T T

x x yx V x y C V C y y V y− − −= = .

From above, the diagonal x-basis variance matrix is: ˆ ˆ T

x yV CV C=

Multiplying this relation on the LHS by TC and on the RHS by C , we get:

ˆ ˆT T T

x yC V C C CV C C= and since 1 1T TC C C C CC CC− −= = = = (the unit matrix)

Then: ˆ ˆT

y xV C V C= , the y-basis covariance matrix.

Check – does 1 ˆ ˆ

y yV V − = ? (n.b. we already showed that 1 1

ˆ ˆ ˆ ˆx x x xV V V V− −= = above)

1 1 1 1

ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆT T T T T T

y y x x x x x xV V C V CC V C C V V C C V V C C C C C− − − −

= ==

= = = = = =

Yes!

Using det det detAB A B= ⋅ :

( ) ( ) ( ) ( ) ( ) ( ) ˆ ˆ ˆ ˆdet det det det det det detT T T

y x x xV C V C C V C C V C= = ⋅ = ⋅ ⋅

But: ( ) ( ) ( ) ( )1det det det det 1TC C C C−⋅ = ⋅ = (since C is orthogonal)

Thus: ( ) ˆ ˆdet dety xV V=

We also need the Jacobian (to derive the P.D.F. for y ). Since 1 Ty C x C x−= = :

( )

1 1 1

1 2

2 2 211 2

1 21 2

1 2

, , ... det det, , ...

N

TNN

N

N N N

N

y y yx x xy y y

y y y x x xJ y x J C Cx x x

y y yx x x

∂ ∂ ∂∂ ∂ ∂

∂ ∂ ∂⎛ ⎞

∂ ∂ ∂= = = =⎜ ⎟⎝ ⎠

∂ ∂ ∂∂ ∂ ∂

⇒ 1det det 1TC C−= = ± (since C is orthogonal) and thus | J | = 1.

Thus, when we change variables from the x′s to the y′s via an orthogonal transformation, C

( ) ( )h y dy f x dx= ⇒ ( ) ( )( )

( ) ( ) ( ) ( )1 1det detT

f x f x f x f xh y f x

C CJ y x −= = = = =

±

n.b. The determinant

of an orthogonal matrix C is det(C) = ±1.

Page 4: 22 22 () fileFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede P598AEM Lecture Notes 09 1

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

P598AEM Lecture Notes 09 4

Then the x-basis P.D.F. ( ) ( )( )

11 2

ˆ1 2 2

1 1, , ... ˆ2 det

Txx V x

N N

x

f x f x x x eVπ

−−≡ =

Becomes the y-basis P.D.F. ( ) ( )( )

11 2ˆ

1 2 2

1 1, , ... ˆ2 det

Tyy V y

N N

y

h y h y y y eVπ

−−≡ =

We can (of course) generalize these results to N non-centered x, y-random variables as well:

( ) ( )( )

( ) ( )

( )1 11 1

2 2ˆ ˆ ˆ ˆˆ ˆ

1 2 2 2

1 1 1 1, , ... ˆ ˆ2 2 det det

T Tx x x xx x V x x R V R

N N N

x x

f x f x x x e eV Vπ π

− −− − − −≡ = =

( ) ( )( )

( ) ( )

( )1 11 1

2 2ˆ ˆ ˆ ˆˆ ˆ

1 2 2 2

1 1 1 1, , ... ˆ ˆ2 2 det det

T Ty y y yy y V y y R V R

N N N

y y

h y h y y y e eV Vπ π

− −− − − −≡ = =

where the residual matrices are defined as: ˆ ˆ( )xR x x≡ − and

ˆ ˆ( )yR y y≡ −

and the covariance matrices are defined as: ˆ ˆ ˆˆ ˆ[( ) ( ) ] [ ]T T

x x xV E x x x x E R R≡ − − =

and: ˆ ˆ ˆˆ ˆ[( ) ( ) ] [ ]T T

y y yV E y y y y E R R≡ − − = We shall encounter the argument of the exponential function of the Gaussian/normal P.D.F.

( ) ( )1 1

ˆ ˆ ˆ ˆˆ ˆT T

y y y yy y V y y R V R− −− − = again in the (near) future, when we develop the formalism (“Hypothesis Testing”) to ask in a rigorous, quantitative way whether N measurements of random variables 1 2, , ... Ny y y are consistent (or not) with a set of given/specified expectation

values 1 2ˆ ˆ ˆ, , ... Ny y y and a given /specified variance matrix ˆ

yV . In particular, we will define the “Chi-Squared” parameter as:

( ) ( )2 1 1

ˆ ˆ ˆ ˆˆ ˆT T

y y y yy y V y y R V Rχ − −≡ − − =

Page 5: 22 22 () fileFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede P598AEM Lecture Notes 09 1

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

P598AEM Lecture Notes 09 5

The Binomial Probability Distribution: The Binomial Distribution – n.b. a discrete distribution – is relevant for/useful in situations where a fixed number N of independent experiment trials are carried out, in which each individual trial has only two possible outcomes – e.g. flipping a coin: A , heads ≡ success (with constant, known probability p) or A , tails ≡ failure (with constant known probability q ≡ 1 – p).

The probability ( ) ; ,P n N p of obtaining n successes in N independent, individual trials is given by the Binomial Distribution:

( ) ( ) ( ) ( ) ( )! ! ; , 1 1! ! ! !

N n N nn N n n nNN NP n N p p q p p p pnn N n n N n

− −− ⎛ ⎞= = − = −⎜ ⎟− − ⎝ ⎠

with Binomial Coefficient: ( )!

! !N Nn n N n

⎛ ⎞≡⎜ ⎟ −⎝ ⎠

The np term is the success probability associated with obtaining n successes in N independent, individual trials.

The ( )1 N nN nq p −− = − term is the failure probability associated with the remaining N – n unsuccessful / failed trials.

Note also that the joint probability ( )1 N nnp p −⋅ − corresponds to only one possible ordering / sequence of n successes and (N – n) failures in N independent trials, since

N

ppqqp and N

pqqpp , etc., are equivalent contributions to ( ); ,P n N p .

The factorial term ( )! ! !N n N n− thus gives the total # of all possible unique/distinct permutations of the ordering/sequence of n successes and (N – n) failures in N independent trials. We thus see that n is a random variable associated with a fixed number N of independent trials, and ranges (in discrete, integer steps) from 0 n N≤ ≤ .

In the two figures below, the LHS figure shows ( ); , . P n N p vs n for three examples of the Binomial Distribution, two with N = 20 and p = 0.5 & 0.7, and one with N = 40 and p = 0.5. The RHS figure shows the corresponding Cumulative Distribution for each of these three:

n n

Page 6: 22 22 () fileFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede P598AEM Lecture Notes 09 1

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

P598AEM Lecture Notes 09 6

The expectation/mean value ˆ [ ] nn E n μ≡ ≡ of the discrete random variable n associated with the Binomial Distribution with p known is obtained by summing ( ) ; ,n P n N p⋅ over all allowed values of n successes:

( ) ( ) ( ) ( )0 0 0

! !ˆ [ ] ; , 1! ! ! !

N N NN nn N n n

nn n n

N Nn E n n P n N p n p q n p p Npn N n n N n

μ −−

= = =

≡ ≡ = ⋅ = ⋅ = ⋅ − =− −∑ ∑ ∑

The variance ( ) ( )22 ˆvar [ ]nn E n nσ≡ = − of the discrete random variable n associated with the Binomial Distribution with p known is:

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

2 22 2 2 2

0

22

0

ˆvar [( ) ] [ ] [ ] ; ,

! 1 1! !

N

nn

NN nn

n

n E n n E n E n n P n N p Np

Nn p p Np Np pn N n

σ=

=

≡ = − = − = ⋅ −

= ⋅ − − = −−

How are these results obtained? Let us define a random variable in for the ith independent coin-toss experiment:

1 if ( / ) occurs (with probability )0 if ( / ) occurs (with probability 1 )i

A success heads known pn

A failure tails known q p⎧

= ⎨ = −⎩

The expectation value of in for one experimental trial (coin toss) with p apriori known is the sum over the (two) possible values of in , weighted by the probabilities of each of the two possible values of in :

( )ˆ[ ] 1 0 1 0 1i iE n n p q p p p= = ⋅ + ⋅ = ⋅ + ⋅ − = The number of times heads has occurred in a given set of N independent coin-toss experiments is:

1 21

N

N ii

n n n n n=

= + + + = ∑ and thus: 1 21 1

ˆ[ ] [ ] [ ]N N

N i ii i

E n n E n n n E n E n Np= =

⎡ ⎤= = + + + = = =⎢ ⎥⎣ ⎦∑ ∑ .

We can also calculate the variance of 1 2 Nn n n n= + + + in a similar manner:

( ) { }22 2 21 2ˆvar [( ) ] [( [ ]) ] [ ( ) ( ) ( ) ]n Nn E n n E n E n E n p n p n pσ≡ = − = − = − + − + + −

Defining: [ ]i i i in p n E nξ ≡ − = − , then: 1 2 N n Npξ ξ ξ ξ= + + + = − and, remembering that the

in (and hence the iξ ) are independent, also noting that: [ ] [ ] [ ] 0i i iE E n p E n p p pξ = − = − = − = :

( ) { }22 2 2 2 2 21 2 1 2 1

1var [ ] [ ] [ ] [ ]

N

n N N ii

n E E E N Eσ ξ ξ ξ ξ ξ ξ ξ ξ=

≡ = + + + = + + + = =∑

Now the i in pξ ≡ − can take on a finite set of values:

1 if ( / ) occurs (with probability )0 if ( / ) occurs (with probability 1 )i i

p q A success heads known pn p

p p A failure tails known q pξ

− =⎧≡ − = ⎨ − = − = −⎩

Page 7: 22 22 () fileFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede P598AEM Lecture Notes 09 1

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

P598AEM Lecture Notes 09 7

Thus, the expectation value of 2iξ with p apriori known, for one experimental trial (coin toss)

is the sum over the (two) possible values of 2iξ , weighted by the probabilities of each of the two

possible values of iξ :

( ) ( ) ( )22 2

1

[ ] 1iE q p p q qp p q qp p pξ=

= ⋅ + − ⋅ = + = = −

Hence, the variance associated with the Binomial Distribution ( ); ,P n N p with p apriori known is:

( ) ( )2 2

1 1

var [ ] 1N N

n ii i

n E qp Nqp Np pσ ξ= =

≡ = = = = −∑ ∑ and thus: ( )2 1n n Nqp Np pσ σ= = = −

The “frequency of success” (aka success fraction) with p apriori known is defined as: Nf n N≡ .

Then: [ ]ˆ[ ]N NE n NpE f f p

N p= = = = and:

( )22

2 2

1N

nf

p pNpq pqN N N Nσσ

−= = = = ⇒ ( )2

2

1N

nf

p ppqN N Nσσ

−= = =

Summary of Results for the Binomial Distribution − when p is apriori known:

In a situation in which either A (= success/heads) or A (= failure/tails) occurs, then given p = apriori known probability for A to occur, thus q = 1 − p = known probability for A to occur in N independent trials, the probability ( ); ,P n N p of obtaining n successes in N independent trials is given by:

( ) ( ) ( ) ( ) ( )! ! ; , 1 1! ! ! !

N n N nn N n n nNN NP n N p p q p p p pnn N n n N n

− −− ⎛ ⎞= = − = −⎜ ⎟− − ⎝ ⎠

with Binomial Coefficient: ( )!

! !N Nn n N n

⎛ ⎞≡⎜ ⎟ −⎝ ⎠

In the table below, we summarize the some of the most frequently-used statistical parameters associated with the Binomial Distribution:

Random Variable n = Number of Successes fN = Frequency of Success Expectation Value: ˆ[ ]E n n Np= = [ ]NE f p= Variance: ( )2 1n Np pσ = − ( )2 1

Nfp p Nσ = −

Standard Deviation: ( )1n Np pσ = − ( )1Nf

p p Nσ = −

Skewness: ( )( )1

1 2

1

p

Np pγ

−=

− ( ) ( )

( )1 2

1 2 11

N

pf

NNp pγ

−=

Excess Kurtosis: ( )( )2

1 6 (1 )1

p pNp p

γ− −

=−⎡ ⎤⎣ ⎦

( ) ( )( )2 2

1 6 (1 ) 11N

p pf

NNp pγ

− −=

−⎡ ⎤⎣ ⎦

Page 8: 22 22 () fileFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede P598AEM Lecture Notes 09 1

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

P598AEM Lecture Notes 09 8

A Binomial Statistics Example: A “parent nucleus” decays to either “daughter nucleus” A or “daughter nucleus” B.

Suppose the probability to decay to A is known to be quite small: 35 10Ap −× , e.g. because the same experiment previously measured the probability to decay to B, ( )0.995 1B A Ap q p= = − .

Suppose further that we wish to directly measure Ap accurately, say to 1% (because 0.995B Ap q= is already known e.g. to 0.005% − i.e. we wish to explicitly check for self-

consistency – does ?A Bp q= Perhaps there are as-yet undiscovered decay modes…).

How many radioactive decays AN of parent nuclei to daughter A must we observe in order to achieve this accuracy on Ap ?

Here we are trying to measure ANf , whose expectation value is Ap , to an accuracy 0.01A

NAf

pσ ≤ .

∴ ( )10.01A

N

A AA AAf

A A

p pp q pN N

σ−

= = ≤

or: ( )

( )( ) ( ) ( )

2

2 2 2 2

10.01 0.01 0.01 0.01

A AA A A A AA

A A A A

p pp q p p pNp p p p

− −≥ = = ≅ since 35 10 1Ap −×

∴ AA

pN ≥

( )2 20.01 Ap ( )

4 77 6

2 3

1 10 10 0.2 10 2 105 10 50.01 Ap −= = = = × = ×

× decays.

Since 50.01 5 10AN

Afpσ −≤ × and 5 55 10 5 10B

NBf

pσ − −≤ × × , and since the same experiment

measured both Ap (for A) and Bp (for B) with their accompanying uncertainties, then an explicit consistency check is to see if experimentally ( )1 ?A Bp p+ = Or, as a null-test:

( ) ( )1 0 ?A Bp pΔ ≡ + − = The 1-sigma uncertainty on Δ is 2 2 52 5 10A BN Nf f

σ σ σ −Δ = + ⋅ × .

For experimental situations where p is small , then 1 1q p= − , hence: Nf

p Nσ and thus the fractional uncertainty associated with the frequency of success, Nf in the small-p regime is:

1 1 1ˆ ˆ

Nf

N

pp N Np nf

σ= =

In other words, the fractional uncertainty of the frequency of success ˆ ˆNf n N≡ equals 1 over the square root of the number of successes, n̂ .

Page 9: 22 22 () fileFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede P598AEM Lecture Notes 09 1

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

P598AEM Lecture Notes 09 9

In the small-p regime, the standard deviation associated with the number of successes, n for N independent trials is: ( ) ˆ1n Npq Np p Np nσ = = − ≅ = . This is often called “the statistical uncertainty on the number of successes” or, for short: “the uncertainty on the number of events”.

Thus, the statistical uncertainty on the number of events nσ equals the square root of the number of observed events, n̂ . The above expressions are valid only in the small-p regime, i.e. when p, the probability that the given event will occur, is small ( )1p . Since from N trials we have a sample of n events, and p n N= , we see that p is “small” only if N is “large”. We keep these ideas in mind as we look at the Poisson Distribution... shortly… What do we do in a situation involving the experimental measurement of a known-to-be Binomial-type process – such as the radioactive decays of nuclei, when p is apriori unknown? Experimentally, we would e.g. observe/measure (a large number) n (successful) independent decays of N nuclei ( = “# trials”, here in this example) in some (large) time interval tΔ . An unbiased estimate of the decay (= success) probability per nuclei, p is the sample mean: p n N= ; it is an estimator of (the true, but apriori unknown) p, which (from the Law of Large

Numbers), in the limit of N → ∞ (or equivalently n → ∞ {because tΔ → ∞ }), ˆlimN

p p→∞

= , the true p. The corresponding unbiased estimate of the survival (= “failure”) probability per nuclei is

( )1 1q p n N= − = − .

The probability ( ) ; ,P n N p of obtaining n successes in N independent trials when (the true) p (and hence true q) are unknown is {also} given by the following Binomial Distribution:

( ) ( ) ( ) ( ) ( )! ! ; , 1 1! ! ! !

N n N nn N n n nNN NP n N p p q p p p pnn N n n N n

− −− ⎛ ⎞= = − = −⎜ ⎟− − ⎝ ⎠

The sample mean of this Binomial Distribution is:

( ) ( )( )

( ) ( )0 0 0

! ![ ] ; , 1! ! ! !

N N NN kN kk k

nk k k

N Nn E n k P k N p k p q k p p N pk N k k N k

μ −−

= = =

≡ ≡ = ⋅ = ⋅ = ⋅ − =− −∑ ∑ ∑

The (unbiased) variance of this Binomial Distribution (= an estimator of the true variance,

( )2 1n Np pσ = − ) when p is apriori unknown, is:

( ) ( )2 1 1

1Ns N p p N p p

N≡ − ≅ −⎡ ⎤⎣ ⎦−

for N → ∞

Page 10: 22 22 () fileFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede P598AEM Lecture Notes 09 1

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

P598AEM Lecture Notes 09 10

The Multinomial Probability Distribution: The generalization of the Binomial Probability Distribution in which there are only two outcomes (digital 1/0 - yes/no - success/failure) per trial to multiple possible outcomes per trial (such as rolling a die – with 6 possible outcomes, or pulling a card from a deck of 52 cards, etc.) is the Multinomial Probability Distribution.

If there are m possible outcomes per trial with corresponding known probability ip of obtaining the ith possible outcome per trial, since one of the m possible outcomes per trial must

be realized, we have the constraint relation: 1

1m

ii

p=

=∑ .

If N total trials are carried out, each individual trial will yield one of the m possible outcomes. The probability for a particular/specific sequence of outcomes, e.g. outcome i on the 1st trial, outcome j on the 2nd trial, etc. is the product of the N corresponding probabilities

i j kp p p⋅ ⋅ ⋅… , resulting in 1n outcomes of the 1st type, 2n outcomes of the 2nd type, etc. The number of such sequences in N total trials that result in 1n outcomes of the 1st type, 2n outcomes of the 2nd type, etc. is the multinomial factor:

1 1

!! ! !m

Nn n n…

If we do not care about the exact sequence/order of the m possible outcomes – just the total numbers for the outcomes for each type ( ) 1:in i m= in N total trials, then the joint probability for 1n outcomes of the 1st type, 2n outcomes of the 2nd type, etc. in N total trials is the Multinomial Probability Distribution:

( ) 1 2 1 21 2 1 2 1 2 1 2

11 21 2

!, , , ; , , , , !, , ,! ! ! !

im m

nmn nn n n n i

m m m mimm i

multinomial coefficient

N pNP n n n N p p p p p p p p p Nn n nn n n n=

⎛ ⎞= = =⎜ ⎟

⎝ ⎠∏… … … …

……

Since the Multinomial Distribution is associated with m possible outcomes, a complete/full graph/plot of ( )1 2 1 2, , , ; , , , , m mP n n n N p p p… … can only be done in m dimensions… Taking projections of it onto e.g the in axis, the expectation value/mean of the Multinomial Distribution, viewed from this in variable, when the ip are apriori known is:

ˆ[ ]i i iE n n Np= = .

The variance and covariance (!) of the Multinomial Probability Distribution, when the ip are apriori known are:

( ) ( )2 2var [( [ ]) ] 1ii n i i i in E n E n Np pσ= = − = −

and:

( ) ( )cov , [( [ ])( [ ])] i j i i j j i jn n E n E n n E n Np p i j= − − = − ≠

Page 11: 22 22 () fileFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede P598AEM Lecture Notes 09 1

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

P598AEM Lecture Notes 09 11

If the m possible outcomes associated with the Multinomial Probability Distribution are grouped together into only two possible categories, e.g. { }1n (= “success”) and { }2 3, , , mn n n…

(= “failure”) then this situation reduces to a Binomial process with ( ) ( )1 1 2 31 mq p p p p≡ − = + + + :

( ) ( ) ( ) ( ) 11 1 11 1 1 1 1

1 1 1 1

! !; , 1! ! ! !

N nn N n nN NP n N p p q p pn N n n N n

−−= ⋅ = ⋅ −− −

If the m possible outcomes associated with the Multinomial Probability Distribution are grouped together into e.g. three possible categories, e.g. { }1n , { }2n and everything else

{ }3 4, , , mn n n… the joint probability for 1n outcomes of the 1st type, 2n outcomes of the 2nd type, and 1 1N n n− − outcomes for everything else in N total trials is:

( ) ( ) ( ) 1 21 21 2 1 2 1 2 1 2

1 2 1 2

!, ; , , 1! ! !

N n nn nNP n n N p p p p p pn n N n n

− −= − −− −

As another example of a Multinomial Probability Distribution , consider the probability for obtaining a particular/specific histogram associated with N independent measurements of an (arbitrary) random variable ix .

The histogram of the random variable ix has m bins. For the N independent measurements of the random variable ix , there will be 1n entries in the 1st bin, 2n entries in the 2nd bin, etc. Since the covariance of the Multinomial Probability Distribution is negative:

( ) ( )cov , i j i jn n Np p i j= − ≠ This tells us/physically means that an upward (downward) fluctuation in the number of entries

in in the ith histogram bin of the random variable x will be positively (negatively) correlated with a downward (upward) fluctuation, respectively, in the number of entries jn of the jth histogram bin. As an experimenter, one could become (explicitly) aware of this fact e.g. if the entire / complete experiment, consisting of a set of N measurements, is repeated many times (e.g. 106 complete experiments). One can also gain some insights into such bin-correlation effects e.g. by histogramming the original data with coarser bin size (i.e. fewer bins) or e.g. using a histogram that has the same number of bins as the original histogram, but systematically shifting the bin boundaries e.g. by integer multiples {1:9} of 1/10th of the bin width. For a single experiment consisting of N measurements, if one is attempting to extract physical meaning e.g. from a particular/specific, finite-small subset of adjacent/contiguous histogram bins in the histogram, it is very well worth knowing/quantitatively understanding what the expected level of fluctuations in the entries of the histogram bins should be, due to such correlations….

Page 12: 22 22 () fileFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede P598AEM Lecture Notes 09 1

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

P598AEM Lecture Notes 09 12

e.g. Do you see any interesting unaccounted-for “bumps” in the inclusive μ μ+ − invariant mass spectrums, courtesy of the ATLAS and CMS experiments at the LHC:

Page 13: 22 22 () fileFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede P598AEM Lecture Notes 09 1

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

P598AEM Lecture Notes 09 13