35
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 Reading • Chapter 5 (continued) Lecture 8 Key points in probability • CLT CLT examples

Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

A6523 Signal Modeling, Statistical Inference

and Data Mining in Astrophysics Spring 2011

Reading •  Chapter 5 (continued)

Lecture 8 • Key points in probability • CLT • CLT examples

Page 2: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

Prior vs Likelihood

Box & Tiao

Page 3: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

“Learning” in Bayesian Estimation

Box & Tiao

Page 4: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics
Page 5: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics
Page 6: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics
Page 7: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics
Page 8: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics
Page 9: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

3

I. Mutually exclusive events:

If a occurs then b cannot have occurred.

Let c = a+ b + ! “or” (same as a " b)

P (c) = P{a or b occurred} = P (a) + P (b)

Let d = a · b · ! “and” (same as a # b)

P (d) = P{a and b occurred} = 0 if mutually exclusive

II. Non-mutually exclusive events:

P (c) = P{a or b} = P (a) + P (b)$ P (ab)! "# $

III. Independent events:

P (ab) % P (a)P (b)

Examples

I. Mutually exclusive events

toss a coin once:

2 possible outcomes H & T

H & T are mutually exclusive

H & T are not independent because P (HT ) = P{heads & tails} = 0 so P (HT ) &= P (H)P (T ).

Page 10: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

4

II. Independent events

toss a coin twice = experiment

The outcomes of the experiment are

1st toss 2nd toss

H1 H2

H1 T2

T1 H2

T1 T2

events might be defined as:

H1H2 = event that H on 1st toss, H on 2nd

H1T2 = event that H on 1st toss, T on 2nd

T1H2 = event that T on 1st toss, H on 2nd

T1T2 = event that T on 1st toss, T on 2nd

note P (H1H2) = P (H1)P (H2) [as long as coin not altered between tosses]

Page 11: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

5

Random Variables

Of interest to us is the distribution of probability along the real number axis:

Random variables assign numbers to events or, more precisely, map the event space into a set of numbers:

a !" X(a)

event !" number

The definition of probability translates directly over to the numbers that are assigned by random variables.

The following properties are true for a real random variable.

1. Let {X # x} = event that the r.v. X is less than the number x; defined for all x [this defines all

intervals on the real number line to be events]

2. the events {X = +$} and {X = !$} have zero probability. (Otherwise, moments would not

be finite, generally.)

Distribution function: (CDF = Cumulative Distribution Function)

FX(x) = P{X # x} % P{all eventsA : X(A) # x}

properties:

1. FX(x) is a monotonically increasing function of x.

2. F (!$) = 0, F (+$) = 1

3. P{x1 # X # x2} = F (x2)! F (x1)

Probability Density Function (pdf)

fX(x) =dFX(x)

dx

properties:

1. fX(x) dx = P{x # X # x+ dx}

2.!!"! dx fX(x) = FX($)! FX(!$) = 1! 0 = 1

Page 12: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

All three measures are localization measures

Other quantities are needed to measure the width and asymmetry of the PDF, etc.

Page 13: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

6

Continuous r.v.’s: derivative of FX(x) exists !x

Discrete random variables: use delta functions to write the pdf in pseudo continuous form.

e.g. coin flipping

Let X =

!

"

#

1 heads

"1 tails

then

fX(x) =1

2[!(x+ 1) + !(x " 1)]

FX(x) =1

2[U(x+ 1) + U(x" 1)]

Functions of a random variable:

The function Y = g(X) is a random variable that is a mapping from some event A to a number Y

according to:

Y (A) = g[X(A)]

Theorem, if Y = g(X), then the pdf of Y is

fY (y) =n$

j=1

fX(xj)

|dg(x)/dx|x=xj

,

where xj, j = 1, n are the solutions of x = g!1(y). Note the normalization property is conserved (unit

area).

This is one of the most important equations!

Example

Y = g(X) = aX + b

dg

dx= a

g!1(y) = x1 =y " b

a

fY (y) =fX(x1)

|dg(x1)/dx|= a!1fX(

y " b

a).

*

Page 14: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

Comment about “natural” random number generators

Page 15: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

7

To check: show that!!"! dy fY (y) = 1

Example Suppose we want to transform from a uniform distribution to an exponential distribution:

We want ant fY (y) = exp(!y). A typical random number generator gives fX(x) with

fX(x) =

"1, 0 " x < 1;0, otherwise.

Choose y = g(x) = ! ln(x). Then:

dg

dx= !

1

x

x1 = g"1(y) = e"y

fY (y) =fX [exp(!y)]

|! 1/x1|= x1 = e"y.

Moments

We will always use angular brackets < > to denote average over an ensemble (integrating overan ensemble); time averages and other sample averages will be denoted di!erently.

Expected value of a random variable:

E(X) # $X% =#

dx xfX(x)

&denotes expectation w.r.t. the PDF of x

Arbitrary power:

$Xn% =#

dx xnfX(x)

Variance:!2x = $X2% ! $X%2

Function of a random variable: If y = g(x) and $Y % '!

dy y fY (y) then it is easy to show that

$Y % =!

dx g(x)fX(x).

Proof:

$y% '#

dy fY (y) =

#

dyn

$

j=1

fX [xj(y)]

|dg[xj(y)]/dx|

Factoid: Poission events in time have spacings that are exponentially distributed

Page 16: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

8

A change of variable: dy = dgdx dx yields the result.

Central Moments:µn = !(X " !X#)n#

Moment Tests:

Moments are useful for testing hypotheses such as whether a given PDF is consistent with data:

E.g. Consistency with Gaussian PDF:

kurtosis k =µ4

µ3/22

" 3 = 0

skewness parameter ! =µ3

µ3/22

= 0

k > 0 $ 4th moment proportionately larger $ larger amplitude tail than Gaussian and lessprobable values near the mean.

Page 17: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

9

Uses of Moments:

Often one wants to infer the underlying PDF of an observable, e.g. perhaps because determinationof the PDF is tantamount to understanding the underlying physics of some process.

Two approaches are:

1. construct a histogram and compare the shape with a theoretical shape.

2. determine some of the moments (usually low-order) and compare.

Suppose the data are {xj , j = 1, N}

1. One could form bins of size !x and count how many xj fall into each bin. If N is largeenough so that nk = # points in the k-th bin is also large, then a reasonably good estimateof the PDF can be made. (But beware of dependence of results on choice of binning.)

2. However, often times N is too small or one would like to determine only basic information

about the shape of the distribution (is it symmetric?), or determine the mean and varianceof the PDF or test whether the data are consistent with a given PDF (hypothesis testing).

Some of the typical situations are:

i) assume the data were drawn from a Gaussian parent PDF; estimate the mean and ! ofthe Gaussian [parameter estimation]

ii) test whether the data are consistent with a Gaussian PDF [moment test]

note that if the r.v. is zero mean then the PDF is determined solely by one parameter: !

fX(x) =1!2"!2

e!x2/2!2

The moments are

"xn# =

!

"

#

1 · 3...(n$ 1)!n % (n$ 1)!! !n n even

0 n odd

Therefore, the n = 2 moment = 1st non-zero moment & all other moments.

This statement remains for more multi-dimensional Gaussian processes:

Any moment of order higher than 3 is redundant ... or can be used as a test forgaussianity.

Page 18: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

10

Characteristic Function:

Of considerable use is the characteristic function

!X(!) ! "ei!x# !!

dx fX(x) ei!x.

If we know !X(!) then we know all there is to know about the PDF because

fX(x) =1

2"

!

d! !X(!) e!i!x

is the inversion formula.

If we know all the moments of fX(x), then we also can completely characterize fX(x). Similarly,

the characteristic function is a moment-generating function:

!X(!) = "ei!X# !" "#

n=0

(i!X)n

n!

$

="#

n=0

(i!)n

n!"Xn#

because the expectation of the sum = sum of the expectations.

By taking derivatives we can show that

#!

#!|!=0 = i"X#

#2!

#!2|!=0 = i2"X2#

#k!

#!k|!=0 = in"Xn#

or

"Xn# = i!n #n!

#!n|!=0 = ($i)n

#n!

#!n|!=0 Price#s theorem

Characteristic functions are useful for deriving PDFs of combinations of r.v.’s as well as for deriving

particular moments.

Page 19: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

11

Joint Random Variables

Let X and Y be two random variables with their associated sample spaces. The actual eventsassociated with X and Y may or may not be independent (e.g. throwing a die may map into

X ; choosing colored marbles from a hat may map into Y ). The relationship of the events will bedescribed by the joint distribution function of X and Y :

FXY (x, y) ! P{X " x, Y " y}

and the joint probability density function is

fXY (x, y) !!2Fxy(x, y)

!x!y(a two dimensional PDF)

Note that the one dimensional PDF of X , for example, is obtained by integrating the joint PDFover all y:

fX(x) =

!

dy fXY (x, y)

which corresponds to asking what the PFf of X is given that the certain event for Y occurs.

Example: flip two coins a and b. Let heads =1; tails =0. Define 2 r.v.’s: X = a + b; Y = a. With

these definitions X + Y are statistically dependent.

Characteristic function of joint r.v.’s:

!XY ("1,"2) = #ei(!1X+!2Y )$ =!!

dx dy ei(!1x+!2y)fXY (x, y).

For x, y independent

!XY ("1,"2) =" !

dx fX(x) ei!1x

# " !

dy fY (y) ei!2y

#

! !X("1) !Y ("2).

Example for independent r.v.’s: flip two coins a and b. As before, heads = 1 and tails = 0, let

x = a, y = b (x and y are independent).

Independent random variables

Two random variables are said to be independent if the events mapping into one r.v. are indepen-dent of those mapping into the other.

Page 20: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

12

In this case, joint probabilities are factorable so that

FXY (x, y) = FX(x) FY (y)

fXY (x, y) = fX(x) fY (y).

Such factorization is plausible if one considers moments of independent r.v.’s:

!XnY m" = !Xn"!Y m"

which follows from

!XnY m" #! !

dx dy xnym fXY (x, y) =" !

dx xnfX(x)# " !

dy ymfY (y)#

.

Page 21: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

13

Convolution theorem for sums of independent RVs

If Z = X+Y where X, Y are independent random variables, then the PDF of Z is the convolutionof the PDFs of X and Y :

fZ(z) = fX(x) ! fY (y) =!

dx fX(x) fY (z " x) =

!

dx fX(z " x) fY (x).

proof: By definition,

fZ(z) =d

dzFZ(z)

Consider

Fz(z) = P{Z # z}

Now, as before, this is

FZ(z) = P{X + Y # z} = P{Y # z "X}.

To evaluate this, first evaluate the probability P{Y # z " x} where x is just a number.

Now

P{Y # z " x} $ FY (z " x) $! z!x

!"dy fY (y)

but P{Y # z " X} is the probability that Y # z " x for all values of x so we need to integrate

over x and weight by the probability of x:

P{Y # z "X} =

! "

!"dx fX(x)

! z!x

!"dy FY (y)

that is, P{Y # z "X} is the expected value of FY (z " x). By the Leibniz integration formula

d

db

! g(b)

a

d! h(!) $ h(g(b))dg(b)

db

we obtain the convolution results.

Page 22: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

14

Characteristic function of Z = X + Y

For X, Y independent we have

fZ = fX ! fY " !Z(!) = #ei!z$ = !X(!) !Y (!)

Variance of Z: if variance of X and Y are "2X , "

2Y , then variance of Z is "2

Z = "2X + "2

Y .

Assume X and Y and hence Z are zero mean r.v.’s, then we have

"2X = #x2$ = i!2 "2#x

"!2 (! = 0) = %"2#x

"!2 (! = 0)

"2Y = #y2$ = %"2#y

"!2 (! = 0)

Using Price’s theorem:

"2Z = #Z2$ = %

#2$Z

#!2(! = 0)

= %#2

#!2[$X(!) $Y (!)]!=0

= %#

#!

!

$X#$Y

#!+ $Y

#$X

#!

"

!=0

= %!

$X#2$Y

#!2+ $Y

#2$x

#!2+ 2

#$X

#!·#$Y

#!

"

!=0.

We have ”discovered” that variances add (independent variables only):

"2Z = "2

X + "2Y .

Page 23: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

15

Multivariate random variables: N dimensional

The results for the bivariate case are easily extrapolated. If

Z = X1 +X2 + . . .+XN =N!

j=1

Xj

where the Xj are all independent r.v.’s, then

fZ(z) = fX1! fX2

! . . . ! fXN

and

!Z =N"

j=1

!Xj(!)

and

"2Z "

N!

j=1

"2Xj.

Page 24: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

16

Central Limit Theorem:

Let

ZN =1!N

N!

j=1

Xj

where the Xj are independent r.v.’s with means and variances

µj " #Xj$

!2j = #X2

j $ % #Xj$2.

and the PDFs of the Xj ’s are almost arbitrary. Restrictions on the distributions of each Xj are

that

i) !2j > m > 0 m = constant

ii) #|X|n$ < M = constant for n > 2

In the limit N %& ', ZN becomes a Gaussian random variable with mean

#ZN$ =1!N

N!

j=1

µj

and variance

!2Z =

1

N

N!

j=1

!2j .

Example: suppose the Xj are all uniformly distributed between ±12 , so

fX(x) = !(x) (sin "f

"f=

sin !2

#/2

Page 25: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

17

Thus the characteristic function is

!j(!) = !ei!xj" =sin !/2

!/2

Graphically:

Gaussian

N = 2 N = 3 N = #

e!x2

(sin !

2

!/2 )2 ( sin !/2!/2 )3 e!!2

From the convolution results we have

""NZN

(!) =!sin!/2

!/2

"N

From the transformation of random variables we have that

fZN(x) =

$N f"NZN

($Nx)

and by the scaling theorem for Fourier transforms

"ZN(!) = ""

NZN

! !$N

"

=!sin!/2

$N

!/2$N

"N

.

Page 26: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

18

Now

limN!"

!ZN(") = e#

1

2!2"2

Z

or

fZN(x) =

1!

2# $2Z

e#x2/2"2

Z .

Consistency with this limiting form can be seen by expanding !ZNfor small "

!ZN(") !

""/2"N # 1

3!("/2"N)3

"/2"N

#N

! 1#"2

24

that is identical to the expansion of exp (#"2$2Z/2).

if the CLT holds:

Page 27: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

CLT Comments

•  A sum of Gaussian RVs is automatically a Gaussian RV (can show using characteristic functions)

• Convergence to a Gaussian form depends on the actual PDFs of the terms in the sum and their relative variances

•  Exceptions exist!

Page 28: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

19

CLT: Example of a PDF that does not work

The Cauchy distribution and its characteristic function are

fX(x) =!

"

1

!2 + x2

!(w) = e!!|"|

Now

ZN =1!N

N!

j=1

xj

has a characteristic function

!N (#) = e!N!|"|/"N

By inspection the exponential will not converge to a Gaussian. Instead, the sum of N Cauchy RVsis a Cauchy RV.

Is the Cauchy distribution a legitimate PDF? No!

The variance diverges:

"X2# =" #

#dx x2!

"

1

!2 + x2$ %.

Page 29: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

A CLT Problem •  Consider a set of N quantities

that are i.i.d. (independently and identically distributed) with zero mean

•  We are interested in the cross correlation between all unique pairs

•  What do you expect <CN> to be? •  What do you expect the PDF of

CN to be?

{ai, i = 1, . . . , N}�ai� = 0

�aiaj� = σ2aδij

CN =1

NX

i<j

aiaj =1

NX

N−1�

i=1

N�

j=i+1

aiaj

NX = N(N − 1)/2

Page 30: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

A CLT Problem (2) •  Note:

•  The number of independent quantities (random variables) is N

•  The sum CN has terms that are products of i.i.d. variables

• Any given term in the sum is s.i. of some of the other terms

•  The PDF of products is different from the PDF of individual factors

•  In the limit N >> 1 there should be many independent terms in the sum

•  N=2: • Can show that PDF is symmetric (odd

order moments = 0) •  N>2:

• Can show that the third moment ≠ 0 • What gives?

Page 31: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics
Page 32: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

20

Conditional Probabilities & Bayes’ Theroem

We have considered P (!), the probability of an event ! . Also obeying axioms of probabilityare conditional probabilities: P ("|!), the probability of the event " given that the event ! has

occurred.

P ("|!) !P ("!)

P (!)

Recast the axioms as

I. P ("|!) " 0

II. P ("|!) + P ("̄|!) = 1

III.

P ("! |#) = P ("|#)P (! |"#)= P (! |#)P ("|!#)

How does this relate to experiments? Use the product rule:

P (! |"#) =P (! |#)P ("|!#)

P ("|#)

or, letting M = model (or hypothesis), D = data and I = background information (assumptions),

P (M|DI) = P (M|I)P (D|MI)P (D|I)

Terms:

prior: P (M|I)

sampling distribution for D: P (D|MI) (also called likelihood for M)

prior predictive for D: P (D|I) (also called global likelihood for M or evidence for M)

Page 33: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

21

Particular strengths of Bayesian method include:

1. One must often be explicit about what is assumed about I, the background information.

2. In assessing models, we get a PDF for parameters rather than just point estimates.

3. Occam’s razor (simpler models win, all else being equal) is easily invoked when comparingmodels. We may have many di!erent models, Mi that we wish to compare. Form the

odds ratio: from the posterior PDFs: P (Mi|DI):

Oi,j !P (Mi|DI)P (Mj |DI)

=P (Mi|I)P (Mj |I)

P (D|MiI)P (D|MjI)

.

Page 34: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

22

Example

Data: {ki}, i = 1, . . . , n, drawn from Poisson process

Poisson PDF: Pk =!ke!!

k!

Want: mean of process

Frequentist approach:

We need an estimator for the mean; consider the likelihood

f(!) =n!

i=1

P (ki) =1

"ni=1 ki!

!!n

i=1kie!n!.

Maximizing,

df

d!= 0 = f(!)

#

!n + !!1n$

i=1

ki

%

we obtain an estimator for the mean is

k̄ =1

n

n$

i=1

ki.

Page 35: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics

23

Bayesian approach:

Likelihood (as before):

P (D|MI) =n!

i=1

P (ki) =1

"nı=1 ki!

!!n

i=1kie!n!.

Prior:P (M|I) = P (!|I)

AssumeP (!|I)! ! !U(!)

Prior Predictive:

P (D|I) "# "

!"d!U(!)P (D|MI) =

n!nx̄

"nı=1 ki!

!(nx̄).

Combining all the above, we find

P (!|{ki}I) =nnx̄

!(nx̄)!nx̄e!n! U(!)

Note that rather than getting a point estimate for the mean, we get a PDF for its value. Forhypothesis testing, this is much more useful than a point estimate.