Error analysis lecture 7

7/30/2019 Error analysis lecture 7

1/28

Physics 509 1

Physics 509: Dealing with Gaussian

Distributions

Scott OserLecture #7

September 25, 2008


2/28

Physics 509 2

Outline

Last time: we finished our initial tour of Bayesiananalysis, having studied the question of prior

assignment in detail, particularly maximumentropy methods.

Today we return to the Gaussian distribution andhighlight various important aspects:1) The Central Limit Theorem2) The Chebyshev Inequality

3) Mathematics of multi-dimensional Gaussians4) Gaussian Error ellipses5) Marginalizing Gaussian distributions

6) Application: the Rayleigh power test


3/28

Physics 509 3

The Central Limit Theorem

If X is the sum of N independent random variables xi, each taken

from a distribution with mean iand variance

i2, then the

distribution for X approaches a Gaussian distribution in the limit

of large N. The mean and variance of this Gaussian are givenby:

X= i

VX= Vi= i2


4/28

Physics 509 4

The Central Limit Theorem: the caveats

I said N independentvariables! Obviously the variables must individually have finite variances. I've said nothing about how fastthe distribution approaches a

Gaussian as N goes to infinity. It is not necessary for all N variables to be identically distributed,provided that the following conditions apply (the Lyapunovconditions):

sn2=

i=1

n

i2

A. rn3=

i=1

nEXi i

3 must be finite for every n

B. limn rn

sn=0

(This is the sum of the third central moments.)

where


5/28

Physics 509 5

The Central Limit Theorem: the benefits

The beauty of the central limit is that many very complicated factors alltend to come out in the wash. For example, normal distributions areused to model:

the heights of adults in a population the logarithm of the heights/weights/lengths of members of a biological

population the intensity of laser light changes in the logarithm of exchange rates, price indices, and stock

market indices

Note that sometimes we assume the logarithm of a quantity follows anormal distribution rather than the quantity itself. The reason is that ifthe various factors are all multiplicative rather than additive, then the

central limit theorem doesn't apply to the sum, but it does to the sum ofthe logarithms. This is called a log-normal distribution.

Normal distribution is more likely to describe the meat of the distribution

than the tails.


6/28

Physics 509 6

CLT: How much is enough?

How many independent variables do you need to add in order to get avery good normal distribution? Difficult question---depends on what thecomponent distributions look like.

Best solution: simulate it.

Possibly useful convergence theorems for identically distributedvariables:

convergence is monotonic with N---as N increases the entropy of thedistribution monotonically increases to approach a normal distribution'sentropy (remember maximum entropy principles)

if third central moment is finite, then speed of convergence (asmeasured by the difference between the true cumulative distribution

and the normal cumulative distribution at a fixed point) is at least asfast as 1/sqrt(N).


7/28

Physics 509 7

CLT: Heliostat Example

Heliostat: multi-faceted mirror that tracks the Sun

Overlapping Sun images(each a uniform disk)


8/28

Physics 509 8

The Chebyshev inequality

Suppose that you make a measurement, and get a result that is 3standard deviations from the mean. What is the probability ofthat happening?

If you know the PDF (Gaussian or otherwise), of course youjust calculate it.

But what if you don't know the form of the PDF?

The Chebyshev inequality gives you a way to put an upperlimit on the probability of seeing an outlier---even when youknow almost nothing about the distribution.


9/28

Physics 509 9

Statement of the Chebyshev inequality

Let X be a random variable for which Var(X) exists. Then for anygiven numbert>0the following is true:

ProbXEXtVarX

t2

For example, the probability of getting a measurement that differsfrom the true mean by more than 3 is 0

ProbXtEX

t


10/28

Physics 509 10

Review of covariances of joint PDFs

Consider some multidimensional PDFp(x1

...

xn

). We define the

covariance between any two variables by:

cov x i, x j = dx px xix i x jx j

The set of all possible covariances defines a covariance matrix,often denoted by V

ij. The diagonal elements of V

ijare the

variances of the individual variables, while the off-diagonal

elements are related to the correlation coefficients:

Vij= [1

212 12 ... 1 n 1n

21 1 n 22

... 2n 2 n

n1 1 n n2 2 n ... n2 ]


11/28


12/28

Physics 509 12

Approximating the peak of a PDF with amultidimensional Gaussian

Suppose we havesomecomplicated-looking PDF in

2D that has awell-definedpeak.

How might weapproximate theshape of this PDFaround itsmaximum?


13/28

Physics 509 13

Taylor Series expansion

Consider a Taylor series expansion of the logarithm of thePDF around its maximum at (x

0,y

0):

logPx , y=P0Axx0B yy0Cxx02D yy0

22Exx0yy0 ...

Since we are expanding around the peak, then the firstderivatives must equal zero, so A=B=0. The remainingterms can be written in matrix form:

logPx , yP0x , y

C E

E D

x

y

In order for (x0,y

0) to be a maximum of the PDF (and not a

minimum or saddle point), the above matrix must be

positive definite, and therefore invertible.


14/28

Physics 509 14

Taylor Series expansion

Let me now suggestively denote the inverse of the abovematrix by V

ij. It's a positive definite matrix with three

parameters. In fact, I might as well call these parameters

x,

y, and.

Exponentiating, we see that around its peak the PDF can beapproximated by a multidimensional Gaussian. The fullformula, including normalization, is

logPx , yP0x ,y C E

E D x

y

Px , y =1

2x y 12

exp

{

1

2 12

[ xx0

x

2

yy0

y

2

2

xx0

x

yy0

y

] }This is a good approximation as long as higher order terms inTaylor series are small.


15/28

Physics 509 15

Interpretation of multidimensional Gaussian

Px , y =1

2x y 12 exp

{

1

2 12 [ xx0

x 2

yy0

y 2

2

xx0

x yy0

y ] }Can I directly relate the free parameters to the covariance matrix?First calculate P(x) by marginalizing overy:

Pxexp { 1212 xx0

x 2

} dy exp { 12 12 [ yy0

y 2

2 xx0x yy0

y ]}Px exp

{

1

2 12

xx 0

x

2

} dy exp

{

1

2 12

[ yy0

y

2

2

xx0

x

yy 0

y

2

xx0

x

2

2

xx0

x

2

]}Px exp { 12 12 xx 0x

2

} dy exp { 12 12 [ yy0y xx0x 2

2 xx0 x

2

]}Px exp {

12 12

xx 0 x

2

}exp {

2

2 12 xx0

x 2

}=exp {12

xx0 x

2

}So we get a Gaussian with width

x. Calculations of

ysimilar, and can also

show thatis correlation coefficient.


16/28

Physics 509 16

P(x|y)

Px , y =

1

2x y 12 exp

{

1

2 12

[ xx0

x 2

yy0

y 2

2

xx0

x yy0

y ] }Note: if you view y as a fixed parameter, then the PDF P(x|y) is aGaussian with width of:

x 12

and a mean value of

x0 xy

yy0

(It makes sense that the width of P(x|y) is always narrower than

the width of the marginalized PDF P(x) (integrated over y). Ifyou know the actual value of y, you have additional informationand so a tighter constraint on x.


17/28

Physics 509 17

x=2

y=1

=0.8

Red ellipse:contour with

argument ofexponentialset to equal-1/2

Blue ellipse:contourcontaining

68% of 2Dprobabilitycontent.


18/28

Physics 509 18

Contour ellipses

The contour ellipses are defined by setting the argument of theexponent equal to a constant. The exponent equals -1/2 on the

red ellipse from the previous graph. Parameters of this ellipseare:

Px , y =

1

2x y 12 exp

{

1

2 12

[ xx0

x 2

yy0

y 2

2

xx0

x yy0

y ] }

tan 2=2 x y

x2 y

2

u=cos

2

x

2sin

2

y

2

cos2 sin 2

v=cos

2

y

2sin

2

x

2

cos2sin

2


19/28

Physics 509 19

Probability content inside a contour ellipse

For a 1D Gaussian exp(-x2

/22

), the 1 limits occur when theargument of the exponent equals -1/2. For a Gaussian there's a68% chance of the measurement falling within around the mean.

But for a 2D Gaussian this is not the case. Easiest to see this forthe simple case ofx=

y=1:

1

2 dx dy exp

[

1

2

x2y2

]=

0

r0

drexp

[

1

2

r2

]=0.68

Evaluating this integral and solving gives r02=2.3. So 68% of

probability content is contained within a radius of(2.3).

We call this the 2D contour. Note that it's bigger than the 1Dversion---if you pick points inside the 68% contour and plottheir x coordinates, they'll span a wider range than thosepicked from the 68% contour of the 1D marginalized PDF!


20/28

Physics 509 20

x=2

y=1

=0.8

Red ellipse:contour with

argument ofexponentialset to equal-1/2

Blue ellipse:contourcontaining

68% ofprobabilitycontent.


21/28

Physics 509 21

Marginalization by minimizationNormal marginalization

procedure: integrateover y.

For a multidimensionalGaussian, this gives

the same answer asfinding the extrema ofthe ellipse---for everyx, find the the value ofy that maximizes thelikelihood.

For example, at x=2the value of y which

maximizes thelikelihood is justwhere the dashed linetouches the ellipse.The value of the

likelihood at that pointthen is the value P(x)


22/28

Physics 509 22

Two marginalization proceduresNormal marginalization procedure: integrate over nuisance variables:

Px= dy Px , y

Alternate marginalization procedure: maximize the likelihood as a function of

the nuisance variables, and return the result:

Pxmaxy

Px , y

(It is not necessarily the case that the resulting PDF is normalized.)

I can prove for Gaussian distributions that these two marginalizationprocedures are equivalent, but cannot prove it for the general case (In factthey give different results).

Bayesians always follow the first prescription. Frequentists most often usethe second.

Sometimes it will be computationally easier to apply one, sometimes theother, even for PDFs that are approximately Gaussian.

Example of the CLT: Do werewolf sighting vary


23/28

Physics 509 23

Example of the CLT: Do werewolf sighting varywith phase of the moon?


24/28

Physics 509 24

Rayleigh power periodicity test

Imagine doing a randomwalk in 2D. If all directions(phases) equally likely, nonet displacement. If somephases more likely than

others, on average you geta net displacement.

This motivates the

Rayleigh power statistic:

S=i=1

N

sin ti 2

i=1

N

cos ti 2

Really just the length(squared) of thedisplacement vector fromthe origin. For the werewolf

data, S=8167.


25/28

Physics 509 25

Null hypothesis expectation for Rayleigh power

So S=8167. Is that likely or not? What do we expect to get?

If no real time variation, all phases are equally likely. It's like arandom walk in 2D. By Central Limit Theorem, total displacementin x or in y should be Gaussian with mean 0 and variance N2:

2=1

20

2

cos2 d=

1

20

2

sin2d=

1

2

Since average displacements in x and y are uncorrelated (you cancalculate the covariance to show this), the joint PDF must be

Px , y=2

Nexp [ 1N x2y2 ]

Since average displacements in x and y are uncorrelated (you cancalculate the covariance to show this), the joint PDF must be

We can do a change of variables to get this as a 1D PDF in s=r2(marginalizing over the angle):

Ps=1

Ne

s/N


26/28

Physics 509 26

So, do werewolves come out with the full moon?

Data set had S=8167 for N=885. How likely is that?

Assuming werewolf sightings occur randomly in time, then theprobability of getting s>8167 is:

Ps= 8167

1885

es /885=exp [8167/885]=104

Because this is a very small probability, we conclude thatwerewolves DO come out with the full moon.

(Actually, we should only conclude that their appearances varywith a period of 28 days---maybe they only come out during thenew moon ...)

Data was actually drawn from a distribution:

Pt 0.90.1sin t

H f l i h R l i h ?


27/28

Physics 509 27

How powerful is the Rayleigh power test?

Rayleigh power

probability:0.0001

Compare to 2

test on foldedphase diagram:

2 =48.4/27 d.o.f.

P = 0.0025


28/28

Physics 509 28

To save time, let's get started on Lecture 8.

Documents

Error analysis lecture 7