45
e Triangle of Statistical Inference: Likelih Dat a Scientif ic Model Probabil ity Model Inferenc e

The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Embed Size (px)

Citation preview

Page 1: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

The Triangle of Statistical Inference: Likelihoood

Data

Scientific Model

Probability Model

Inference

Page 2: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

An example...

The Data:xi = measurements of DBH on 50 treesyi = measurements of crown radius on those trees

The Scientific Model:yi = xi + (linear relationship, with 2 parameters ( and an error term () (the residuals))

The Probability Model: is normally distributed, with E[ ] and variance estimated from the observed variance of the residuals...

Data

Scientific Model (hypothesis)

Probability Model

Inference

Data

Scientific Model (hypothesis)

Probability Model

Inference

Page 3: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

So what is likelihood –and what is it good for?

1. Probability based (“inverse probability”).“ mathematical quantity that appears to be

appropriate for measuring our order of preference among different possible populations but does not in fact obey the laws of probability”

--RA. Fischer

2. Foundation of theory of statistics.

3. Enables comparison of alternate models.

Page 4: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

So what is likelihood –and what is it good for?

Scientific hypotheses cannot be treated as outcomes of trials (probabilities) because we will never have the full set of possible outcomes.

However, we can calculate the probability of obtaining the results, given our model (scientific hypothesis (P(data|model).

Likelihood is proportional to this probability.

Page 5: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Likelihood is proportional to probability

P(data|hypothesis ( )) L(hyp|data)

P(data|hypothesis ( )) = kL(| data)

In plain English: “The likelihood (L) of the set of parameters () (in the scientific model), given the data (x), is proportional to the probability of observing the data, given the parameters...”

{and this probability is something we can calculate, using the appropriate underlying probability model (i.e. a PDF)}

Page 6: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Parameter values can specify your hypotheses

P(datai|θ) = kL(θ |data)

Parameter is fixed, datavariable. What is the prob. of observing the data if our model andparameters are correct?

Parameter is variable, data fixed. What is the likelihood of the parametergiven the data?

Page 7: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

General Likelihood Function

L(θ|x) = cg(x|θ )

Likelihood function Data (xi )

Parameters in probability model

Probability density function or discrete density function

c is a constant, and thus, unimportant in comparison of alternate hypotheses or models as long as the data remain constant.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

-4 -3 -2 -1 0 1 2 3 4 5

Pro

ba

bili

ty

Page 8: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

General Likelihood Function

L(θ|x) = g(xi|θ )

Likelihood function Data (xi )

Parameters in probability model

Probability density function or discrete density function

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

-4 -3 -2 -1 0 1 2 3 4 5

Pro

ba

bili

ty

n

i 1

The parameters of the pdf are determined by the data and by the value of the parameters in scientific model!!

Page 9: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Likelihood Axiom

“Within the framework of a statistical model, a

set of data supports one statistical

hypothesis better than other if the likelihood

of the first hypothesis, on the data, exceeds

the likelihood of the second hypothesis”.

(Edwards 1972, p.)

Page 10: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

How to derive a likelihood function: Binomial

The most likely parameter value is 10/50 = 0.20

105010 11010

1

)p(p)p|(g)|p(L

)p|x(cg)x|p(L

)p(px

n)x(g xnx Probability Density Function

Likelihood

Event 10 trees die out of a population of 50

Question: What is the mortality rate (p)?

Page 11: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Likelihood Profile: Binomial

105010 11010 )p(p)p|(g)|p(L

-2E-12

0

2E-12

4E-12

6E-12

8E-12

1E-11

1.2E-11

1.4E-11

1.6E-11

0 0.2 0.4 0.6 0.8 1

Value of estimated parameter (p)

lik

eli

ho

od

The model (parameter p) is defined by the data!!

Page 12: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

An example: Can we predict tree fecundity as a function of tree size?

The Data:xi = measurements of DBH on 50 treesyi = counts of seeds produced by trees

The Scientific Model:yi = DBH + exponential relationship, with 1 parameter ( and an error term ()

The Probability Model:Data follow a Poisson distribution, with E[x] and variance = λ

Data

Scientific Model

Probability Model

Inference

Page 13: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Iterative process

1. Pick a value for the parameter in your scientific model, Recall scientific model is yi = DBH

2. For each data point, calculate the expected (predicted) value for that value of

3. Calculate the probability of observing what you observed given that parameter value and your probability model.

4. Multiply the probabilities of individual observations.

5. Go back to 1 until you find maximum likelihood estimate for parameter

Data

Scientific Model (hypothesis)

Probability Model

Inference

Data

Scientific Model (hypothesis)

Probability Model

Inference

Page 14: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Likelihood Poisson Process

!x

)(e)xX(P

x E[x]= λ

Page 15: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

First pass…

Model yi = DBH

Predicted = 0.0617Observed = 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3

Number of seeds

Pro

ba

bilit

y

Poisson randomVariable with E[x1]=0.0617

0 1 2

Do for n observations……

!x

)(e)xX(P

x

001702 .!x

)pred(e)X(P

xpred

E[x]= λ

Page 16: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Pick a new value of beta...

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 2 3 4 5 6

Number of seeds

Pro

ba

bili

ty Poisson randomVariable with E[x1]=0.498

0 1 2 3 4

Do for n observations……

!x

)(e)xX(P

x

07502 .!x

)pred(e)X(P

xpred

Model yi = DBH

Predicted = 0.498Observed = 2

Page 17: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Probability and Likelihood

1. Multiplying probabilities is not convenient from a computational point of view.

2. We take the log of the probabilities and we maximize that number.

3. This gives us the Maximum Likelihood Estimate of the parameter.

Page 18: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Likelihood Profile

-170.5

-170.0

-169.5

-169.0

-168.5

-168.0

-167.5

-167.0

-166.5

-166.0

-165.5

-165.0

0 0.2 0.4 0.6 0.8 1 1.2

Lo

g L

ike

liho

od ML estimate

Beta

Model yi = DBH

Page 19: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Model comparison

Data

Scientific Model (hypothesis)

Probability Model

Inference

Data

Scientific Model (hypothesis)

Probability Model

Inference

The Data:xi = measurements of DBH on 50 treesyi = counts of seed produced by trees

The Scientific Models:yi = DBH + exponential relationship, with 1 parameter (

OR

yi = DBH + linear relationship with 1 parameter (

The Probability Model:Data follow a Poisson distribution, with E[x] and variance = λ

Page 20: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Model comparison

Data

Scientific Model (hypothesis)

Probability Model

Inference

Data

Scientific Model (hypothesis)

Probability Model

Inference

The Data:xi = measurements of DBH on 50 treesyi = counts of seed produced by trees

The Scientific Models:yi = DBH + exponential relationship, with 1 parameter (

The Probability Model:Data follow a Poisson distribution, with E[x] and variance = λ

OR

Data follow a negative binomial distribution with E[x]=m and clumping parameter k. (Variance is defined by m and k (estimated).

Page 21: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

FIRST PRINCIPLES1. Proportions Binomial2. Several categories Multinomial3. Count events Poisson, Neg. binomial4. Continuous data, additive processes Normal5. Quantities from multiplicative probabilities Lognormal,

Gamma.EMPIRICAL1. Examine residuals.2. Tests different probability distributions for model errors.

Determination of appropriate likelihood function

Probability models can be thought of as competing hypotheses in exactly the same way that different parameter values (structural models) are competing hypotheses.

Page 22: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Likelihood functions: An aside about logarithms

1

)a(log

)blog(*a)blog(

)blog()alog(b

alog

)blog()alog()b*alog(

a

a

Taking the logarithm in base a of a number is the inverse of raising that number to the power a. Example: log101000= 3

Basic Log Operations

Page 23: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Poisson Likelihood Function

!x

)(e)xX(P

x

!x

)(e)x|(L

i

xn

i

i

1

)]!xln(lnx[)x|(oodLoglikelih ii

n

i

1

Likelihood

Discrete Density Function

Variance]X[E

Page 24: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Negative Binomial Distribution Likelihood Function

)]kmln()kn()n(ln)mln(n)kn(ln[

)]kln()kln(k[N)|x(oodLoglikelih

iiiiii

N

i

11

Likelihood

Discrete Density Functionnk

km

m

k

m

!n)k(

)nk()nXPr(

1

)kn(i

ki

i

iN

ii

i)km(

km

)k()n(

)kn(L

11

k is an estimated parameter!!

2

k

mmVariancem]X[E

Page 25: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Normal Distribution Likelihood Function

))x(

exp()x(f2

2

2 22

1

))x(

exp()x|,(L in

i2

2

21 22

1

))x(

()]ln()[ln(n

)x|,(oodLogLikelih

in

i2

2

1 22

2

1

Prob. Density Function

Likelihood

E[x] = μVariance = δ2

Page 26: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Lognormal Distribution Likelihood Function

2

2

2 2

1

2

1

)xln(exp

x)x(f

2

2

21 2

1

2

1

)xln(

expx

)x|,(L in

i

)]xln(

)x̂ln()xln([)]ln()ln([n

)x|,(oodLoglikelih

i

n

i

ii

1

2

22

2

22

2

1

Likelihood

Prob. Density Function

n

)]x̂ln()x[ln()x̂ln()(E ii

n

ii

2

1

22

2

Page 27: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Gamma Distribution Likelihood Function

Prob. Density Function

n

iii

sxaa

sxxasaa

shapea

asXVar

asxE

exas

xf

1

2

/1

/)ln()1()ln()ln(LogLik

parameter scales

parameter

][

][

)(

1)(

Page 28: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Exponential Distribution Likelihood Function

i

n

i

n

i

x

x

x)ln()|x(oodLogLikelih

eL

e)x(f

i

1

1

Prob. Density Function

Likelihood

2

11

Variance]x[E

Page 29: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Evaluating the strength of evidence for the MLE

Now that you have an MLE, how should you evaluate it?

Page 30: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Two purposes of support/confidence intervals

• Measure of support for alternate parameter estimates.

• Help with fitting when something goes wrong.

Page 31: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Methods of calculating support intervals

• Bootstrapping

• Likelihood curves and profiles

Page 32: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Bootstrapping

• Resample the data with replacement and record the number of times that the parameter estimate fell within an interval.

• Frequentist approach: If I sampled my data a large number of times, what would my confidence in the estimate be?

Page 33: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

General method

• Draw the likelihood curve (one parameter) or surface (two parameters) or n-dimensional space (n-parameters).

• Figure out how much the likelihood changes as the parameter of interest moves away from the MLE.

Page 34: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Strength of evidence for particular parameter estimates – “Support”

• Likelihood provides an objective measure of the strength of evidence for different parameter estimates...

Log-likelihood = “Support” (Edwards 1992)

-155

-153

-151

-149

-147

2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8

Parameter Estimate

Lo

g-L

ikel

iho

od

Page 35: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Asymptotic vs. Simultaneous M-Unit Support Limits

• Asymptotic:– Hold all other parameters at their MLE values, and

systematically vary the remaining parameter until likelihood declines by a chosen amount (m)

-155

-153

-151

-149

-147

2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8

Parameter Estimate

Lo

g-L

ike

liho

od

2-unit support interval

Maximum likelihood estimate

What should “m” be? (1.92

is a good number, and is

roughly analogous to a

95% CI)

Page 36: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

An aside on the Likelihood Ratio Test

• Ratios of log-likelihoods (R) follow a chi-square distribution with degrees of freedom equal to the difference in the number of parameters between models A and B.

)]M|Y(L)M|Y(L[R BA 2

Page 37: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Asymptotic vs. Simultaneous M-Unit Support Limits

• Simultaneous:– Resampling method: draw a very large number of

random sets of parameters and calculate log-likelihood. M-unit simultaneous support limits for parameter xi are the upper and lower limits that don’t differ by more than m units of support.

• Set the focal parameter to a range of values and for each value optimize the likelihood for all the other parameters:

In practice, it can require an enormous number of iterations to do this if there are more than a few parameters

Page 38: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Asymptotic vs. Simultaneous Support Limits

Parameter 1

Par

amet

er 2

2-unit dropin support

A hypothetical likelihood surface for 2 parameters

Asymptotic 2-unitsupport limits for P1

Simultaneous 2-unitsupport limits for P1

Page 39: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference
Page 40: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference
Page 41: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Other measures of strength of evidence for different parameter estimates

• Edwards (1992; Chapter 5)– Various measures of the “shape” of the

likelihood surface in the vicinity of the MLE...

How pointed is the peak?...

Page 42: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Evaluating Support for Parameter Estimates

• Traditional confidence intervals and standard errors of the parameter estimates can be generated from the Hessian matrix

– Hessian = matrix of second partial derivatives of the likelihood function with respect to parameters, evaluated at the maximum likelihood estimates

– Also called the “Information Matrix” by Fisher

– Provides a measure of the steepness of the likelihood surface in the region of the optimum

– Evaluated at MLE points it is the observed information matrix

– Can be generated in R using optim

Page 43: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

An example from R

• The Hessian matrix (when maximizing a log likelihood) is a numerical approximation for Fisher's Information Matrix (i.e. the matrix of second partial derivatives of the likelihood function), evaluated at the point of the maximum likelihood estimates. Thus, it's a measure of the steepness of the drop in the likelihood surface as you move away from the MLE.

> res$hessiana b sd

a -150.182 -2758.360 -0.201b -2758.360 -67984.416 -5.925Sd -0.202 -5.926 -299.422

Page 44: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

The Hessian CI

• Now invert the negative of the Hessian matrix to get the matrix of parameter variance and covariance

• The square roots of the diagonals of the inverted negative Hessian are the standard errors

• Are we reverting to a frequentist framework?

> solve(-1*res$hessian)a b sd

a 2.613229e-02 -1.060277e-03 3.370998e-06b -1.060277e-03 5.772835e-05 -4.278866e-07sd 3.370998e-06 -4.278866e-07 3.339775e-03

(*and 1.96 * S.E. is a 95% CI)

> sqrt(diag(solve(-1*res$hessian)))a b sd0.1616 0.007597 0.05779

Page 45: The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference

Some references

A.W.F. Edwards. 1972. Likelihood. Cambridge University Press.

Feller, W. 1968. An introduction to probability theory and its application. Wiley & Sons.