18
Hypothesis Testing

Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty Confidence…

Embed Size (px)

DESCRIPTION

Statistical Testing of Hypotheses  Objective of determining whether parameters differ from hypothesized values.  Testing procedure framed in terms of comparison of null and alternative hypotheses.  Null hypothesis  Alternative hypothesis  Compound (1-sided) alternatives

Citation preview

Page 1: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

Hypothesis Testing

Page 2: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

Statistical Inference – dealing with parameter and model uncertainty Confidence Intervals (credible intervals)

Hypothesis Tests

Goodness-of-fit

Model Selection (AIC)

Model averaging

Bayesian Model Updating

Page 3: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

Statistical Testing of Hypotheses Objective of determining whether parameters

differ from hypothesized values.

Testing procedure framed in terms of comparison of null and alternative hypotheses. Null hypothesis Alternative hypothesis

Compound (1-sided) alternatives

00 = :H0a : H

0a : H

Page 4: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

Procedure for Null Hypothesis Testing Specify

Null and alternate hypotheses Compute test statistic

Random variable that summarizes expected sample distribution given the null hypothesis is true (i.e., probability difference between sample means for 2 groups if the true mean is the same)

Compare to the sampled value Test is binary decision

Significance level of the test α Two types of incorrect decisions:

rejecting H0 when it is true (Type I error), Pr = α Not rejecting H0 when it is false (type II error), Pr = β

Power of test = 1- β

Page 5: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

P- valuesProbability of obtaining a test statistic at

least as extreme as the observed one, given that null hypothesis is true Not Pr(Null hypothesis is true) Degree of consistency of data with null, not strength of

evidence for alternative

Dependent on null hypothesis (if null is that groups differ by 1 rather than 0 p-value will be different)

Dependent on sample size

Does not provide information on size or precision of estimated effect (i.e., not a measure of biological relevance or a confidence interval)

Page 6: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

Reality Conclusion ↓

H0 True, Ha False H0 False, Ha True

We don’t reject H0(null hypothesis)

1-a (eg., 0.95)Odds of saying there is no difference when there really is one.95/100 times when there is no effect, we’ll correctly say there is no effect.

b (eg., 0.20) Type II ErrorOdds of saying there is no difference when there really is one.20/100 times when there is an effect, we’ll say there is no effect.

We reject H0, accept Ha (alternative hypothesis)

a (eg., 0.05) Type I ErrorOdds of saying there is a difference when there is no difference.5/100 times when there is no effect, we’ll say there is one.

1-b (eg., 0.80)POWEROdds of saying there is a difference when there is one.80/100 times when there is an effect, we’ll say there is oen.

Page 7: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

Comments: Lower a , lower power; higher a , higher

power

Lower a , conservative in terms of rejecting the null when it’s true (i.e., saying there’s an effect when there really isn’t)

Higher a increases chances of Type I Error, decreases chances of making Type II Error and decreases rigor of test.

Page 8: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

Sample Design: Choosing a sample size

Can choose based on target precision level (e.g. confidence intervals) or power (hypothesis testing)

Requires assumptions and tentative parameter (e.g., effect size) values Therefore it is an exercise in approximation Might identify cases where minimal sufficient

sample size would bust budget or is logistically impractical to achieve.

Page 9: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

Likelihood Ratio Tests Comparing fit of hypothesized model to another model

(generally containing more parameters) – Null model to alternative model with additional parameters

Maximum likelihood estimation theory Evaluate MLE for restricted and more general parameterizations Calculate Likelihood ratio

Chi-square, with degrees of freedom of difference in number of parameters among models

)x|L()x|L(

2- = a

0e

2

ˆˆ

log

Page 10: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

Goodness of fit (GOF)“Absolute” fit of model

Goal is to determine if data are reflective of the statistical model

Test statistic generated based on probability model using estimated parameters

Is there variation in the data that is out of the ordinary and not reflected in our statistical model?

Page 11: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

Pearson’s 2 GOF Test Logic: If model is ‘correct’, expected and observed cell

frequencies for each multinomial cell should be similar.

Imagine we roll a die 1000 times and want to determine if the model P(x=1)=P(x=2)=…=P(x=6) is a good model

If sample size is adequate, (expect at least 2 per cell),

S(observedi – expectedi)2/expectedi

~ 2(df = # cells – 1)

Page 12: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

General GOF if Large Sample Pearson’s 2

Direct use of Deviance

)x|L()x|L(

2- = saturated

0e

2

ˆˆ

log

Page 13: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

Bootstrap GOF Test Compute ML estimates for parameters, Produce empirical distribution of estimates:

Simulate capture histories for each released animal: assume parameter = MLE, ‘flip coins’ to determine survival and capture for each

period, Repeat for {Ri } animals, estimate parameters, Compute deviance

Compare original deviance with empirical distribution (i.e., what percentile?)

Page 14: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

What indicates lack of fit? With GOF test, the hope and purpose is

to accept the null hypothesis

This is counter to statistical hypothesis testing

What is a ‘significant’ P-value?

Page 15: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

What might cause lack of fit? Inadequate model structure for

detection or survival, e.g., Age dependence, size dependence, etc. Trap dependence Those released earlier survive at different

rate Non-random temporary emigration

Lack of independence among animals

Page 16: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

Solutions Inadequate model structure? Improve it.

Goal: Subdivide animals sufficiently that there is equal p and S within a group

Warning: Inadequate model structure doesn’t always result in lack of fit, e.g., Permanent emigration (confounded with S) Random temporary emigration (confounded with p) Random ring loss (confounded with S)

Lack of independence? Correct for Overdispersion Inflate variances using quasi-likelihood.

Page 17: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

Adjusting Variances for Overdispersion Based on Quasi-likelihood theory

c-hat = deviance/df adj. variance = c-hat * (ML variance)

Page 18: Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence…

Bootstrap adjustment for overdispersion For each simulated sample:

compute deviance compute c-hat = deviance/df

Bootstrap c-hat = (observed deviance)/(mean deviance), or (observed c-hat) / (mean c-hat)

Note: could replace deviance with Pearson 2, or mean with median.