LLJ110705

8/6/2019 LLJ110705

1/71

Overview of HypothesisTesting

Laura Lee Johnson, Ph.D.StatisticianNational Center for Complementary

and Alternative Medicine

[email protected]

Monday, November 7, 2005

8/6/2019 LLJ110705

2/71

Objectives

Discuss commonly used terms

P-value

Power

Type I and Type II errors

Present a few commonly usedstatistical tests for comparing two

groups

8/6/2019 LLJ110705

3/71

Outline

Estimation and Hypotheses

Continuous Outcome/Known Variance

Test statistic, tests, p-values,confidence intervals

Unknown Variance

Different Outcomes/Similar TestStatistics

Additional Information

8/6/2019 LLJ110705

4/71

Statistical Inference

Inferences about a population are

made on the basis of results

obtained from a sample drawnfrom that population

Want to talk about the larger

population from which the subjectsare drawn, not the particular

subjects!

8/6/2019 LLJ110705

5/71

What Do We Test

Effect or Difference we are interested in Difference in Means or Proportions

Odds Ratio (OR)

Relative Risk (RR) Correlation Coefficient

Clinically important difference Smallest difference considered

biologically or clinically relevant Medicine: usually 2 group comparison

of population means

8/6/2019 LLJ110705

6/71

Vocabulary (1)

Statistic

Compute from sample

Sampling Distribution All possible values that statistic can have

Compute from samples of a given size

randomly drawn from the same population Parameter

Compute from population

8/6/2019 LLJ110705

7/71

Estimation: From the Sample

Point estimation

Mean

Median

Change in mean/median

Interval estimation95% Confidence interval

Variation

8/6/2019 LLJ110705

8/71

Parameters and

Reference Distributions Continuous outcome data

Normal distribution: N( ,2)

t distribution: t[ ([ = degrees of freedom) Mean = (sample mean)

Variance = s2 (sample variance)

Binary outcome data Binomial distribution: B (n, p)

Mean = np, Variance = np(1-p)

X

8/6/2019 LLJ110705

9/71

Hypothesis Testing

Null hypothesis

Alternative hypothesis

Is a value of the parameterconsistent with sample data

8/6/2019 LLJ110705

10/71

Hypotheses and Probability

Specify structure

Build mathematical model and

specify assumptions Specify parameter values

What do you expect to happen?

8/6/2019 LLJ110705

11/71

Null Hypothesis

Usually that there is no effect

Mean = 0

OR = 1RR = 1

Correlation Coefficient = 0

Generally fixed value: mean = 2 If an equivalence trial, look at NEJM

paper or other specific resources

8/6/2019 LLJ110705

12/71

Alternative Hypothesis

Contradicts the null

There is an effect

What you want to prove

If equivalence trial, special way to

do this

8/6/2019 LLJ110705

13/71

Example Hypotheses

H0: 1 = 2

HA: 1 2

Two-sided test

HA: 1 > 2

One-sided test

8/6/2019 LLJ110705

14/71

1 vs. 2 Sided Tests

Two-sided test

No a priorireason 1 group shouldhave stronger effect

Used for most tests

One-sided test

Specific interest in only one direction

Not scientifically relevant/interestingif reverse situation true

8/6/2019 LLJ110705

15/71

Outline




Unknown Variance



8/6/2019 LLJ110705

16/71

Experiment

Develop hypotheses

Collect sample/Conduct

experiment Calculate test statistic

Compare test statistic with what is

expected when H0 is true Reference distribution

Assumptions about distribution ofoutcome variable

8/6/2019 LLJ110705

17/71

Example:

Hypertension/Cholesterol Mean cholesterol hypertensive men

Mean cholesterol in male general

population (20-74 years old) In the 20-74 year old male

population the mean serum

cholesterol is 211 mg/ml with astandard deviation of 46 mg/ml

8/6/2019 LLJ110705

18/71

Cholesterol Hypotheses

H0: 1 = 2 H0: = 211 mg/ml

= population mean serumcholesterol for male hypertensives

Mean cholesterol for hypertensivemen = mean for general male

population HA: 1 2 HA: 211 mg/ml

8/6/2019 LLJ110705

19/71

Cholesterol Sample Data

25 hypertensive men

Mean serum cholesterol level is

220mg/ml ( = 220 mg/ml)Point estimate of the mean

Sample standard deviation: s =

38.6 mg/ml

Point estimate of the variance = s2

X

8/6/2019 LLJ110705

20/71

Experiment

Develop hypotheses

Collect sample/Conduct experiment

Calculate test statistic Compare test statistic with what is

expected when H0 is true

Reference distribution

Assumptions about distribution ofoutcome variable

8/6/2019 LLJ110705

21/71

Test Statistic

Basic test statistic for a mean

= standard deviation

For 2-sided test: Reject H0

when thetest statistic is in the upper or lower100*/2% of the reference distribution

What is ?

point estimate of

point estimate of - target value oftest statistic =

Q

Q Q

W

8/6/2019 LLJ110705

22/71

Vocabulary (2)

Types of errors

Type I ()

Type II ()

Related words

Significance Level: levelPower: 1-

8/6/2019 LLJ110705

23/71

8/6/2019 LLJ110705

24/71

Type I Error

= P( reject H0 | H0 true)

Probability reject the null hypothesis

given the null is true False positive

Probability reject that hypertensives

=211mg/ml when in truth the meancholesterol for hypertensives is 211

8/6/2019 LLJ110705

25/71

Type II Error (or, 1-Power)

= P( do not reject H0 | H1 true )

False Negative

Power = 1- = P( reject H0 | H1 true )

Everyone wants high power, and

therefore low Type II error

8/6/2019 LLJ110705

26/71

Z Test Statistic

Want to test continuous outcome

Known variance

Under H0

Therefore,0

0

0 0 0

Reject H if 1.96 (gives a 2-sided =0.05 test)/

Reject H if X > 1.96 or X < 1.96

X

n

n n

QE

W

W WQ Q

"

0 ~ (0,1)/

X

Nn

Q

W

8/6/2019 LLJ110705

27/71

Z or Standard Normal

Distribution

8/6/2019 LLJ110705

28/71

General Formula (1-)%

Rejection Region for Mean Point

Estimate

Note that +Z(/2) = - Z(1-/2) 90% CI : Z = 1.645 95% CI :Z = 1.96

99% CI : Z = 2.58

1 / 2 1 / 2,Z Z

n n

E EW W

Q Q

8/6/2019 LLJ110705

29/71

Do Not Reject H0

0 220 211 0.98 1.96

/ 46 / 25

XZ

n

Q

W

! ! !

0

0

46220=X > 1.96 =211+1.96* 228.03 NO!

2546

220=X < 1.96 211-1.96* 192.97 NO!25

n

n

WQ

WQ

!

! !

8/6/2019 LLJ110705

30/71

P-value

Smallest the observed sample would

reject H0

Given H0 is true, probability ofobtaining a result as extreme or more

extreme than the actual sample

MUST be based on a model

Normal, t, binomial, etc.

8/6/2019 LLJ110705

31/71

Cholesterol Example

P-value for two sided test

= 220 mg/ml, = 46 mg/ml

n = 25

H0: = 211 mg/ml

HA: 211 mg/ml

2* [ 220] 0.33P X " !

X

8/6/2019 LLJ110705

32/71

Determining Statistical

Significance: Critical Value Method Compute the test statistic Z (0.98)

Compare to the critical value

Standard Normal value at -level (1.96)

If |test statistic| > critical value

Reject H0 Results are statisticallysignificant

If |test statistic| < critical value Do not reject H0 Results are notstatisticallysignificant

8/6/2019 LLJ110705

33/71

Determining Statistical

Significance: P-Value Method

Compute the exact p-value (0.33)

Compare to the predetermined -level(0.05)

If p-value < predetermined -level

Reject H0 Results are statisticallysignificant

If p-value > predetermined -level Do not reject H0 Results are notstatisticallysignificant

8/6/2019 LLJ110705

34/71

Hypothesis Testing

and Confidence Intervals

Hypothesis testing focuses on

where the sample mean is located

Confidence intervals focus onplausible values for the population

mean

8/6/2019 LLJ110705

35/71

General Formula (1-)% CI for

Construct an interval around the point

estimate Look to see if the population/null mean

is inside

1 / 2 1 / 2,Z Z

X X

n n

E EW W

8/6/2019 LLJ110705

36/71

Outline




Unknown Variance



8/6/2019 LLJ110705

37/71

T-Test Statistic

Want to test continuous outcome

Unknown variance

Under H0

Critical values: statistics books or

computer t-distribution approximately normal for

degrees of freedom (df) >30

0( 1)~

/n

Xt

s nQ

8/6/2019 LLJ110705

38/71

Cholesterol: t-statistic

Using data

For = 0.05, two-sided test from t(24)distribution the critical value = 2.064

| T | = 1.17 < 2.064

The difference is not statisticallysignificant at the = 0.05 level

Fail to reject H0

0 220 211 1.17/ 38.6 / 25

XT

s n

Q ! ! !

8/6/2019 LLJ110705

39/71

CI for the Mean, Unknown

Variance

Pretty common

Uses the t distribution

Degrees of freedom1,1 / 2 1,1 / 2

,

2.064*38.6 2.064*38.6220 , 22025 25

(204.06, 235.93)

n nt s t s

X X

n n

E E

!

!

8/6/2019 LLJ110705

40/71

Outline




Unknown Variance



8/6/2019 LLJ110705

41/71

Paired Tests: Difference

Two Continuous Outcomes

Exact same idea

Known variance: Z test statistic

Unknown variance: t test statistic

H0: d = 0 vs. HA: d 0

Paired Z-test or Paired t-test

/ /

d d Z or T

n s nW! !

8/6/2019 LLJ110705

42/71

Unpaired Tests: Common

Variance

Same idea



H0: 1 = 2 vs. HA: 1 2

Assume common variance

1/ 1/ 1/ 1/

x y x y Z or T

n m s n mW

! !

8/6/2019 LLJ110705

43/71

Unpaired Tests:

Not Common Variance

Same idea



H0: 1 = 2 vs. HA: 1 2

2 2 2 2

1 2 1 2/ / / /

x y x y Z or T

n m s n s mW W ! !

8/6/2019 LLJ110705

44/71

Binary Outcomes

Exact same idea

For large samples

Use Z test statistic

Now set up in terms of

proportions, not means

0

0 0

(1 ) /

p pZ

p p n

!

8/6/2019 LLJ110705

45/71

Two Population Proportions

Exact same idea

For large samples use Z test

statistic

1 2

1 1 2 2

(1 ) (1 )

p pZ

p p p p

n m

!

8/6/2019 LLJ110705

46/71

Vocabulary (3)

Null Hypothesis: H0 Alternative Hypothesis: H1 or Ha or HA Significance Level: level

Acceptance/Rejection Region

Statistically Significant

Test Statistic

Critical Value

P-value

8/6/2019 LLJ110705

47/71

Outline




Unknown Variance



8/6/2019 LLJ110705

48/71

Linear regression Model for simple linear regression

Yi= 0+ 1x1i + i0= intercept

1

= slope

Assumptions

Observations are independent

Normally distributed with constant

variance Hypothesis testing

H0:1 = 0 vs. HA:1 0

8/6/2019 LLJ110705

49/71

Confidence Interval Note

Cannot determine if a particular intervaldoes/does not contain true mean effect

Can say in the long run

Take many samples

Same sample size

From the same population

95% of similarly constructedconfidence intervals will contain truemean effect

Interpret a 95% Confidence

8/6/2019 LLJ110705

50/71

Interpret a 95% Confidence

Interval (CI) for the population

mean, If we were to find many such

intervals, each from a different

random sample but in exactly thesame fashion, then, in the long run,

about 95% of our intervals would

include the population mean, ,and 5% would not.

8/6/2019 LLJ110705

51/71

How NOT to interpret a 95%

CI There is a 95% probability that the true

mean lies between the two confidence

values we obtain from a particular

sample, but we can say that we are 95%

confident that it does lie between these

two values.

Overlapping CIs do NOT imply non-significance

8/6/2019 LLJ110705

52/71

But I Have All Zeros!

Calculate upper bound

Known # of trials without an event (2.11van Belle 2002, Louis 1981)

Given no observed events in n trials,

95% upper bound on rate of occurrenceis 3 / (n + 1)

No fatal outcomes in 20 operations

95% upper bound on rate of

occurrence = 3/ (20 + 1) = 0.143, sothe rate of occurrence of fatalitiescould be as high as 14.3%

8/6/2019 LLJ110705

53/71

Analysis Follows Design

Questions Hypotheses

Experimental Design Samples

Data Analyses Conclusions

8/6/2019 LLJ110705

54/71

Which is more important, or

?

Depends on the question

Most will say protect against Type I

error Need to think about individual and

population health implications and

costs

8/6/2019 LLJ110705

55/71

Affy Gene Chip

False negative

Miss what could be important

Are these samples going to belooked at again?

False positive

Waste resources following dead

ends

8/6/2019 LLJ110705

56/71

HIV Screening

False positive

Needless worry

Stigma False negative

Thinks everything is ok

Continues to spread disease

For cholesterol example?

8/6/2019 LLJ110705

57/71

What do you need to think

about?

Is it worse to treat those who truly

are not ill or to not treat those who

are ill? That answer will help guide you as

to what amount of error you are

willing to tolerate in your trialdesign.

8/6/2019 LLJ110705

58/71

Little Diagnostic Testing

Lingo False Positive/False Negative

Positive Predictive Value

Probability diseased given POSITIVEtest result

Negative Predictive Value

Probability NOT diseased given

NEGATIVE test result Predictive values depend on disease

prevalence

8/6/2019 LLJ110705

59/71

Outline




Unknown Variance



8/6/2019 LLJ110705

60/71

What you should know: CI

Meaning/interpretation of the CI

How to compute a CI for the true

mean when variance is known(normal model)

How to compute a CI for the true

mean when the variance is NOTknown (t distribution)

8/6/2019 LLJ110705

61/71

You Need to Know

How to turn a question into hypotheses

Failing to reject the null hypothesis

DOES NOT mean that the null is true

Every test has assumptions

A statistician can check all the

assumptions

If the data does not meet the assumptions

there are non-parametric versions of the

tests (see text)

8/6/2019 LLJ110705

62/71

P-value Interpretation Reminders

Measure of the strength of

evidence in the data that the null is

not true

A random variable whose value

lies between 0 and 1

NOT the probability that the null

hypothesis is true.

8/6/2019 LLJ110705

63/71

Avoid Common Mistakes:

Hypothesis Testing

If you have paired data, use a

paired test

If you dont then you can lose power If you do NOT have paired data, do

NOT use a paired test

You can have the wrong inference

8/6/2019 LLJ110705

64/71

Common Mistakes:

Hypothesis Testing

These tests have assumptions ofindependence

Taking multiple samples per subject ?

Statistician MUST know Different statistical analyses MUST be

used and they can be difficult!

Distribution of the observations

Histogram of the observations

Highly skewed data - t test - incorrectresults

8/6/2019 LLJ110705

65/71

Common Mistakes:

Hypothesis Testing

Assume equal variances and the

variances are not equal

Did not show variance test

Not that good of a test

ALWAYS graph your data first to

assess symmetry and variance

Not talking to a statistician

8/6/2019 LLJ110705

66/71

Misconceptions

Smaller p-value larger effect

Effect size is determined by the

difference in the sample mean orproportion between 2 groups

P-value = inferential tool

Helps demonstrate that populationmeans in two groups are not equal

8/6/2019 LLJ110705

67/71

Misconceptions

A small p-value means the difference is

statistically significant, not that the

difference is clinically significant

A large sample size can help get a

small p-value

Failing to reject H0 There is not enough evidence to reject H0

Does NOT mean H0 is true

8/6/2019 LLJ110705

68/71

Analysis Follows Design

Questions Hypotheses

Experimental Design Samples

Data Analyses Conclusions

8/6/2019 LLJ110705

69/71

Normal/Large Sample Data?

Binomial?

Independent? Nonparametric test

Expected 5

2 sample Z test for

proportions or

contingency table

McNemars test

Fishers Exact

test

No

No

No

No

Yes

Yes

Yes

N l/ S l D ?

8/6/2019 LLJ110705

70/71

Normal/Large Sample Data?

Inference on means?

Independent? Inference on variance?

Variance

known?

Paired t

Z test

Variances equal?

T test w/

pooled

variance

T test w/

unequal

variance

F test for

variances

Yes

Yes

Yes Yes

Yes

Yes

No

No

No

No

8/6/2019 LLJ110705

71/71

Questions?

Documents

LLJ110705