Bayesian Inference Presenting: Assaf Tzabari. 2 Agenda Basic concepts Conjugate priors Generalized...

Preview:

Citation preview

Bayesian Inference

Presenting:Assaf Tzabari

2

Agenda

Basic concepts Conjugate priors Generalized Bayes rules Empirical Bayes Admissibility Asymptotic efficiency

3

Basic concepts

x - random vector with density |f x - unknown parameter with prior density ( )

Joint density of x and : ( , ) | ( )h x f x

marginal density of x : ( ) ( , )m x h x d

Posterior density of :

( , )( | )

( )

h xx

m x

4

Basic concepts (cont.)

- loss function defined for all ,a A

- decision rule( ) :x X A

Risk function : ( , ) ,xR E L x

Elements of a decision problem:

,L a- the set of all possible decisionsA

Bayes risk function : ( , ) ,r E R

5

Basic concepts (cont.)

or, equivalently, which minimizes:

, |L a x d

A Bayes rule is a decision rule which minimizes ( , )r

A Bayes rule can be found by choosing, for each x, an action which minimizes the posterior expected loss:

, |L a f x d

6

Basic concepts (cont.)

2

2

2 |

|

,

The posterior expacted loss is |

0 | 2 2

x

x

L a a

a x d

da x d E a

da

x E

Example: Bayesian estimation under MSE

7

Conjugate priors

Definition:

A class

F | | f x of prior distributions is a conjugate family for F

ifP

for all | Px P , Ff

Example: the class of normal priors is a conjugate family for the class of normal sample densities,

2 2 1( ) ( | )

2 22 2

2 2 2 2

, , , ( ),

where ( ) and

xx N N N x

x x

8

Using conjugate priors

Step 1: Find a conjugate prior

Choose a class with the same form as the likelihood functions

Step 2: Calculate the posterior

Gather the factors involving in

|xl f x

( , )h x

9

Using conjugate priors (cont.)

11

1

1 /2

( 1) ( 1/ )

( | )

| ( )! !

Assuming ~ ( , ) we get:

( ) ( , )

( , ) | ( ) , , ( ) |

1 The posterior is a ~ ( ,

ix nx nnnx n

xni i ii

a b

nx a n b

x

e ef c e l

x x

G a b

c a b e

h x f c a b x e m x

G nx a

x x

x x

)1/n b

Example: Finding conjugate prior for the Poisson distribution

x=(x1,…, xn ) where xi~P are iid,

Factors fit to gamma distribution of

10

Using conjugate priors (cont.)

( | )

( | )

1~ ( , ) where and

1/

[ ]1/

x

x

G a b a nx a bn b

nx aE a b

n b

Example (cont.): Finding conjugate prior for the Poisson distribution

The Bayes estimator under MSE is then,

The ML estimator is ML x 0 1 2 3 4 5 6 7 8 9 10

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

a=1, b=2a=2, b=2a=3, b=2a=10, b=0.5

Gamma

11

Using conjugate priors (cont.)

More conjugate priors for common statistical distributions:

( ) ( | )

( | )

, ,

where [ ]

x

x

B a b B a x b n x

a a xE

a b a b n

Binomial x~b(p,n) and Beta prior

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

3.5

a=2,b=2a=0.5,b=0.5a=2,b=5a=5,b=2

Beta

12

Using conjugate priors (cont.)

Uniform iid x=(x1,…, xn ) , xi ~U(0) and Pareto prior

( ) 0 ( | ) 0 1

0 1( | ) 0

, max , ,..., ,

( ) max , ,...,where [ ]

1 1

x n

nx

Pa a Pa x x a n

a n x xaE

a a n

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

2

2.5

3a=1a=2a=3

Pareto

13

Conjugate priors (cont.)

| x Advantages

Easy to calculate Intuitive Useful for sequential estimation

Can a conjugate prior be a reasonable approximation to the true prior?

Not always!

14

Conjugate priors (cont.)

Step 1: subjectively determine -fractiles a point z is defined -fractile if

Step 2: look for matching prior and find Bayes estimator

Example: estimating under MSE based on x~N1)

( )F z

1 2

1 22

1 1 3If median =0 and quartiles =(-1) , =1

2 4 4

There are two matching priors (0,2.19) or (0,1)

2 ( ) while ( )

3.19 2

z z z

N C

x xx x x x

x

Only is conjugate prior, but which is a better estimator ?

15

Improper priors

d

Improper prior – a prior with infinite mass Bayes risk has no meaning The posterior function usually exists

Useful in the following cases: Prior information is not available (noninformative priors

are usually improper) The parameter space is restricted

16

Generalized Bayes rules

, ( | )L a f x d

Definition:If is an improper prior, a generalized Bayes rule, for given x, is an action which minimizes

or, if , which minimizes the posterior expected loss. 0 m x

2 2

2 2

2 2

( ) /2(0, )

(0, ) ( ) /2

1/2 /2|

Noninformative (improper) prior - |

2( ) where (0,1)

/

x

x

xx

e II x

e d

ex E x Z N

P Z x

2,x N Example: estimating >0 under MSE based on

17

Generalized Bayes rules (cont.)

Bayes estimator for

ML estimator

Bayes estimator for =2

-10 -5 0 50

1

2

3

4

5

6

Bayes estimator for

x

18

Generalized Bayes rules (cont.)

Generalized Bayes rules are useful in solving problems which don’t include prior information

fx| is a location density with location parameter if fx| =f(x-)

Using we get,

Example: Location parameter estimation under L(a-)

( | ) ( )

( | ) ( )

( ) ( ) ( ) ( )

where

x f y

x f x

E L a L y a x f y dy E L y K

a x K

19

Generalized Bayes rules (cont.)

The generalized Bayes rule has the form,

This is a group of invariant rules, and the best invariant rule is the generalized Bayes rule with the prior

Example (cont.): Location parameter estimation under L(a-)

( )( ) where minimizes ( )f yx x K K E L y K

20

Generalized Bayes rules (cont.)

Under MSE (x) is the posterior mean,

Example (cont.): Location parameter estimation under L(a-)

( )( ) ( ) f yx f x d x E y

for x=(x1,…,xn) , Pitman’s estimator is derived:

1

1

( ,..., )

( )

( ,..., )

n

n

f x x d

x

f x x d

21

Empirical Bayes

m x

Development of Bayes rules using auxiliary empirical (past or current) data

Methods: Using past data in constructing the prior Using past data in estimating the marginal distribution Dealing simultaneously with several decision problems

Xn+1 - sample information with density 1 1|n nf x

x1 ,…,xn - past observations with densities |i if x

22

Determination of the prior from past data

- conditional mean and variance of xi 2 , f f

Assumption: ,…,n ,n+1 - parameters from a common prior

22 2

m f

m f f m

E

E E

- marginal mean and variance of xi 2 , m m

Lemma 1:

Result 1:

2 2 22 2If then mf

m ff f

23

Determination of the prior from past data (cont.)

22

1 1

1 1ˆ ˆ ˆ= and

1

n n

m i m i mi i

x xn n

Step 1: Assume a certain functional form for Conjugate family of priors is convenient

Step 2: Estimate , 2 based on x1,…, xn

Xn+1 can be included too

If f and f2 is constant then:

Step 2a: Estimate m , m2 from the data

E.g.

Step 2b: Use result 1. to calculate , 2

24

Determination of the prior from past data (cont.)

1 122 2

1 1

2 22 2 2

21

|

2

1 1ˆ ˆ and

1

1 1 0ˆ ˆ and 1 where 1

0

ˆ ˆThe prior is ( , ) and the Bayes estimate of under MSE,

ˆ1ˆ( )

ˆ1

n n

m i m ii i

n

x

x x x x sn n

s if sx s s

else

N

x E

2

1 1 12 2

1min 1,

ˆ1 n n nx x x xs

,1i ix N Example: and is assumed to be normal (conjugate prior).

Estimation of and 2 is needed for determining the prior.

25

Estimation of marginal distribution from past data

[the number of equal to ]ˆ

1ix j

m jn

Assumption: The Bayes rule can be represented in terms of m(x)

Advantage: No need to estimate the prior

Step 1: Estimate m(x) x1,…, xn , xn+1 are a sample from the distribution with density m(x)

E.g. in the discrete case,

Step 2: Estimate the Bayes rule, using

Advantage: No need to estimate the prior

m̂ j

26

Estimation of marginal distribution from past data (cont.)

1

1|1 1

1

1

1 111

1 1

1 11

1

|( ) |

1 ( 1)!( )

ˆ1 ( 1)ˆand ( )ˆ

n

nxn n

n

x

n nnn

n n

n nn

n

f x dx E x d

m x

ed

x m xxx

m x m x

x m xx

m x

Example: The Bayes estimation of n+1 when ,under MSE. i ix P

27

Compound decision problems

Independent x1,…, xn are observed, where i are from a common prior

Goal: simultaneously make decisions involving ,…,n

The loss is L(,…,n,a)

Solution: Determine the prior from x1,…, xn using empirical Bayes methods

28

Admissibility of Bayes rules

If a Bayes rule, is unique then it is admissible E.g. Under MSE the Bayes rule is unique Proof: Any rule R-better than must be a Bayes rule itself

For discrete , assuming that is positive, is admissible

For continuous , if R( is continuous in for every then is admissible

Bayes rules with finite Bayes risk are typically admissible:

29

Admissibility of Bayes rules (cont.)

Generalized Bayes rules can be inadmissible and the verification of their admissibility can be difficult.

Example: generalized Bayes estimator of based on

versus the James-Stein estimator ,p pNx θ I

21 1

1

2 2

1 1

,..., , , ,..., ( ) 1 , L( , ) ( )

, ,

is identical to the ML estimator

, ( ) ( )

pt t

p p p p i ii

p p

p p

i i i ii i

x x N a

N E

R E x E x p

θ|xθ|x

x

x θ I θ θ a

θ x I x θ x

θ x

30

Admissibility of Bayes rules (cont.)

Example (cont.): generalized Bayes estimator of versus the James-Stein estimator

2

2 ( | )2

2The James-Stein estimator is 1

1And its risk is , 2

For 2 , is not admissible

JS

js f

p

R p p E

p

x θ

x xx

θx

31

Admissibility of Bayes rules (cont.)

1

| ( ) exp ( ) ( ) ( )p

i ii

f h x T x B

x

Theorem: If x is continues with p-dimensional exponential density and is closed, then any admissible estimator is a generalized Bayes rule

fx| is a p-dimensional exponential density if,

E.g. The normal distribution

32

Asymptotic efficiency of Bayes estimators

x1 ,…,xn are iid samples with density f(xi|)

Definitions: Estimator n(x1,…,xn) of is defined asymptotically

unbiased if,

Asymptotically unbiased estimator is defined asymptotically efficient if,

and 0d

nn H E H

1

( )v

nI

v– asymptotic variance

I() – Fisher information in a single sample

33

Asymptotic efficiency of Bayes estimators (cont.)

| d x

Assumptions for the next theorems

The posterior is a proper continues and positive density, and

The prior can be improper!

The likelihood function lf(x satisfies regularity conditions

34

Asymptotic efficiency of Bayes estimators (cont.)

Theorem: For large values of n, the posterior distribution is approximately –

Conclusion: Bayes estimators such as the posterior mean are asymptotic unbiased

The effect of the prior declines as n increases

1 ,

( ) ( )

lN

nI nI

35

Asymptotic efficiency of Bayes estimators (cont.)

Theorem: If n is the Bayes estimator under MSE then,

Conclusion: The Bayes estimator n under MSE is asymptotically efficient

1 0 ,

( )

d

nn NI

36

Asymptotic efficiency of Bayes estimators (cont.)

Example: estimator of pbased on binomial sample x~b(p,n) under MSE

( ) |

Beta distribution is a conjugate family for the binomial distribution,

~ ( , ) ~ ( , )

The Bayes estimator under MSE is

The ML estimator is arg max ln |

p p x

p

MLp

p B a b p B a x b n x

a xx E p

a b nx

x f x pn

|x

( ) pML

a b nx E p x

a b n a b n

37

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1a=b=2000 a=b=2

(x) Bayes estimatorML estimator

x x

If the prior is concentrated it determines the estimator, “Don’t confuse me with the facts!”

Asymptotic efficiency of Bayes estimators (cont.)

38(x) Bayes estimator when a=b=2ML estimator

9 100 1 2 3 4 5 6 7 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

300 400 500 600 700 800 900 10000 100 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

xx

n=10 n=1000

Asymptotic efficiency of Bayes estimators (cont.)

For large sample, the Bayes estimator tends to become independent of the prior

39

Asymptotic efficiency of Bayes estimators (cont.)

Location distributions: if the likelihood function lf(x satisfies the regularity conditions, then the Pitman estimator after one observation is asymptotically efficient

Exponential distributions: if

then it satisfies the regularity conditions, and the asymptotic efficiency depends on the prior

More examples of asymptotic efficient Bayes estimators

( ) ( )( | ) T x Af x e

40

Conclusions

Bayes rules are designed for problems with prior information, but useful in other cases as well

Determining the prior is a crucial step, which affects the admissibility and the computational complexity

Bayes estimators, under MSE, performs well on large sample

Recommended