LLJ110705

Embed Size (px)

Citation preview

  • 8/6/2019 LLJ110705

    1/71

    Overview of HypothesisTesting

    Laura Lee Johnson, Ph.D.StatisticianNational Center for Complementary

    and Alternative Medicine

    [email protected]

    Monday, November 7, 2005

  • 8/6/2019 LLJ110705

    2/71

    Objectives

    Discuss commonly used terms

    P-value

    Power

    Type I and Type II errors

    Present a few commonly usedstatistical tests for comparing two

    groups

  • 8/6/2019 LLJ110705

    3/71

    Outline

    Estimation and Hypotheses

    Continuous Outcome/Known Variance

    Test statistic, tests, p-values,confidence intervals

    Unknown Variance

    Different Outcomes/Similar TestStatistics

    Additional Information

  • 8/6/2019 LLJ110705

    4/71

    Statistical Inference

    Inferences about a population are

    made on the basis of results

    obtained from a sample drawnfrom that population

    Want to talk about the larger

    population from which the subjectsare drawn, not the particular

    subjects!

  • 8/6/2019 LLJ110705

    5/71

    What Do We Test

    Effect or Difference we are interested in Difference in Means or Proportions

    Odds Ratio (OR)

    Relative Risk (RR) Correlation Coefficient

    Clinically important difference Smallest difference considered

    biologically or clinically relevant Medicine: usually 2 group comparison

    of population means

  • 8/6/2019 LLJ110705

    6/71

    Vocabulary (1)

    Statistic

    Compute from sample

    Sampling Distribution All possible values that statistic can have

    Compute from samples of a given size

    randomly drawn from the same population Parameter

    Compute from population

  • 8/6/2019 LLJ110705

    7/71

    Estimation: From the Sample

    Point estimation

    Mean

    Median

    Change in mean/median

    Interval estimation95% Confidence interval

    Variation

  • 8/6/2019 LLJ110705

    8/71

    Parameters and

    Reference Distributions Continuous outcome data

    Normal distribution: N( ,2)

    t distribution: t[ ([ = degrees of freedom) Mean = (sample mean)

    Variance = s2 (sample variance)

    Binary outcome data Binomial distribution: B (n, p)

    Mean = np, Variance = np(1-p)

    X

  • 8/6/2019 LLJ110705

    9/71

    Hypothesis Testing

    Null hypothesis

    Alternative hypothesis

    Is a value of the parameterconsistent with sample data

  • 8/6/2019 LLJ110705

    10/71

    Hypotheses and Probability

    Specify structure

    Build mathematical model and

    specify assumptions Specify parameter values

    What do you expect to happen?

  • 8/6/2019 LLJ110705

    11/71

    Null Hypothesis

    Usually that there is no effect

    Mean = 0

    OR = 1RR = 1

    Correlation Coefficient = 0

    Generally fixed value: mean = 2 If an equivalence trial, look at NEJM

    paper or other specific resources

  • 8/6/2019 LLJ110705

    12/71

    Alternative Hypothesis

    Contradicts the null

    There is an effect

    What you want to prove

    If equivalence trial, special way to

    do this

  • 8/6/2019 LLJ110705

    13/71

    Example Hypotheses

    H0: 1 = 2

    HA: 1 2

    Two-sided test

    HA: 1 > 2

    One-sided test

  • 8/6/2019 LLJ110705

    14/71

    1 vs. 2 Sided Tests

    Two-sided test

    No a priorireason 1 group shouldhave stronger effect

    Used for most tests

    One-sided test

    Specific interest in only one direction

    Not scientifically relevant/interestingif reverse situation true

  • 8/6/2019 LLJ110705

    15/71

    Outline

    Estimation and Hypotheses

    Continuous Outcome/Known Variance

    Test statistic, tests, p-values,confidence intervals

    Unknown Variance

    Different Outcomes/Similar TestStatistics

    Additional Information

  • 8/6/2019 LLJ110705

    16/71

    Experiment

    Develop hypotheses

    Collect sample/Conduct

    experiment Calculate test statistic

    Compare test statistic with what is

    expected when H0 is true Reference distribution

    Assumptions about distribution ofoutcome variable

  • 8/6/2019 LLJ110705

    17/71

    Example:

    Hypertension/Cholesterol Mean cholesterol hypertensive men

    Mean cholesterol in male general

    population (20-74 years old) In the 20-74 year old male

    population the mean serum

    cholesterol is 211 mg/ml with astandard deviation of 46 mg/ml

  • 8/6/2019 LLJ110705

    18/71

    Cholesterol Hypotheses

    H0: 1 = 2 H0: = 211 mg/ml

    = population mean serumcholesterol for male hypertensives

    Mean cholesterol for hypertensivemen = mean for general male

    population HA: 1 2 HA: 211 mg/ml

  • 8/6/2019 LLJ110705

    19/71

    Cholesterol Sample Data

    25 hypertensive men

    Mean serum cholesterol level is

    220mg/ml ( = 220 mg/ml)Point estimate of the mean

    Sample standard deviation: s =

    38.6 mg/ml

    Point estimate of the variance = s2

    X

  • 8/6/2019 LLJ110705

    20/71

    Experiment

    Develop hypotheses

    Collect sample/Conduct experiment

    Calculate test statistic Compare test statistic with what is

    expected when H0 is true

    Reference distribution

    Assumptions about distribution ofoutcome variable

  • 8/6/2019 LLJ110705

    21/71

    Test Statistic

    Basic test statistic for a mean

    = standard deviation

    For 2-sided test: Reject H0

    when thetest statistic is in the upper or lower100*/2% of the reference distribution

    What is ?

    point estimate of

    point estimate of - target value oftest statistic =

    Q

    Q Q

    W

  • 8/6/2019 LLJ110705

    22/71

    Vocabulary (2)

    Types of errors

    Type I ()

    Type II ()

    Related words

    Significance Level: levelPower: 1-

  • 8/6/2019 LLJ110705

    23/71

  • 8/6/2019 LLJ110705

    24/71

    Type I Error

    = P( reject H0 | H0 true)

    Probability reject the null hypothesis

    given the null is true False positive

    Probability reject that hypertensives

    =211mg/ml when in truth the meancholesterol for hypertensives is 211

  • 8/6/2019 LLJ110705

    25/71

    Type II Error (or, 1-Power)

    = P( do not reject H0 | H1 true )

    False Negative

    Power = 1- = P( reject H0 | H1 true )

    Everyone wants high power, and

    therefore low Type II error

  • 8/6/2019 LLJ110705

    26/71

    Z Test Statistic

    Want to test continuous outcome

    Known variance

    Under H0

    Therefore,0

    0

    0 0 0

    Reject H if 1.96 (gives a 2-sided =0.05 test)/

    Reject H if X > 1.96 or X < 1.96

    X

    n

    n n

    QE

    W

    W WQ Q

    "

    0 ~ (0,1)/

    X

    Nn

    Q

    W

  • 8/6/2019 LLJ110705

    27/71

    Z or Standard Normal

    Distribution

  • 8/6/2019 LLJ110705

    28/71

    General Formula (1-)%

    Rejection Region for Mean Point

    Estimate

    Note that +Z(/2) = - Z(1-/2) 90% CI : Z = 1.645 95% CI :Z = 1.96

    99% CI : Z = 2.58

    1 / 2 1 / 2,Z Z

    n n

    E EW W

    Q Q

  • 8/6/2019 LLJ110705

    29/71

    Do Not Reject H0

    0 220 211 0.98 1.96

    / 46 / 25

    XZ

    n

    Q

    W

    ! ! !

    0

    0

    46220=X > 1.96 =211+1.96* 228.03 NO!

    2546

    220=X < 1.96 211-1.96* 192.97 NO!25

    n

    n

    WQ

    WQ

    !

    ! !

  • 8/6/2019 LLJ110705

    30/71

    P-value

    Smallest the observed sample would

    reject H0

    Given H0 is true, probability ofobtaining a result as extreme or more

    extreme than the actual sample

    MUST be based on a model

    Normal, t, binomial, etc.

  • 8/6/2019 LLJ110705

    31/71

    Cholesterol Example

    P-value for two sided test

    = 220 mg/ml, = 46 mg/ml

    n = 25

    H0: = 211 mg/ml

    HA: 211 mg/ml

    2* [ 220] 0.33P X " !

    X

  • 8/6/2019 LLJ110705

    32/71

    Determining Statistical

    Significance: Critical Value Method Compute the test statistic Z (0.98)

    Compare to the critical value

    Standard Normal value at -level (1.96)

    If |test statistic| > critical value

    Reject H0 Results are statisticallysignificant

    If |test statistic| < critical value Do not reject H0 Results are notstatisticallysignificant

  • 8/6/2019 LLJ110705

    33/71

    Determining Statistical

    Significance: P-Value Method

    Compute the exact p-value (0.33)

    Compare to the predetermined -level(0.05)

    If p-value < predetermined -level

    Reject H0 Results are statisticallysignificant

    If p-value > predetermined -level Do not reject H0 Results are notstatisticallysignificant

  • 8/6/2019 LLJ110705

    34/71

    Hypothesis Testing

    and Confidence Intervals

    Hypothesis testing focuses on

    where the sample mean is located

    Confidence intervals focus onplausible values for the population

    mean

  • 8/6/2019 LLJ110705

    35/71

    General Formula (1-)% CI for

    Construct an interval around the point

    estimate Look to see if the population/null mean

    is inside

    1 / 2 1 / 2,Z Z

    X X

    n n

    E EW W

  • 8/6/2019 LLJ110705

    36/71

    Outline

    Estimation and Hypotheses

    Continuous Outcome/Known Variance

    Test statistic, tests, p-values,confidence intervals

    Unknown Variance

    Different Outcomes/Similar TestStatistics

    Additional Information

  • 8/6/2019 LLJ110705

    37/71

    T-Test Statistic

    Want to test continuous outcome

    Unknown variance

    Under H0

    Critical values: statistics books or

    computer t-distribution approximately normal for

    degrees of freedom (df) >30

    0( 1)~

    /n

    Xt

    s nQ

  • 8/6/2019 LLJ110705

    38/71

    Cholesterol: t-statistic

    Using data

    For = 0.05, two-sided test from t(24)distribution the critical value = 2.064

    | T | = 1.17 < 2.064

    The difference is not statisticallysignificant at the = 0.05 level

    Fail to reject H0

    0 220 211 1.17/ 38.6 / 25

    XT

    s n

    Q ! ! !

  • 8/6/2019 LLJ110705

    39/71

    CI for the Mean, Unknown

    Variance

    Pretty common

    Uses the t distribution

    Degrees of freedom1,1 / 2 1,1 / 2

    ,

    2.064*38.6 2.064*38.6220 , 22025 25

    (204.06, 235.93)

    n nt s t s

    X X

    n n

    E E

    !

    !

  • 8/6/2019 LLJ110705

    40/71

    Outline

    Estimation and Hypotheses

    Continuous Outcome/Known Variance

    Test statistic, tests, p-values,confidence intervals

    Unknown Variance

    Different Outcomes/Similar TestStatistics

    Additional Information

  • 8/6/2019 LLJ110705

    41/71

    Paired Tests: Difference

    Two Continuous Outcomes

    Exact same idea

    Known variance: Z test statistic

    Unknown variance: t test statistic

    H0: d = 0 vs. HA: d 0

    Paired Z-test or Paired t-test

    / /

    d d Z or T

    n s nW! !

  • 8/6/2019 LLJ110705

    42/71

    Unpaired Tests: Common

    Variance

    Same idea

    Known variance: Z test statistic

    Unknown variance: t test statistic

    H0: 1 = 2 vs. HA: 1 2

    Assume common variance

    1/ 1/ 1/ 1/

    x y x y Z or T

    n m s n mW

    ! !

  • 8/6/2019 LLJ110705

    43/71

    Unpaired Tests:

    Not Common Variance

    Same idea

    Known variance: Z test statistic

    Unknown variance: t test statistic

    H0: 1 = 2 vs. HA: 1 2

    2 2 2 2

    1 2 1 2/ / / /

    x y x y Z or T

    n m s n s mW W ! !

  • 8/6/2019 LLJ110705

    44/71

    Binary Outcomes

    Exact same idea

    For large samples

    Use Z test statistic

    Now set up in terms of

    proportions, not means

    0

    0 0

    (1 ) /

    p pZ

    p p n

    !

  • 8/6/2019 LLJ110705

    45/71

    Two Population Proportions

    Exact same idea

    For large samples use Z test

    statistic

    1 2

    1 1 2 2

    (1 ) (1 )

    p pZ

    p p p p

    n m

    !

  • 8/6/2019 LLJ110705

    46/71

    Vocabulary (3)

    Null Hypothesis: H0 Alternative Hypothesis: H1 or Ha or HA Significance Level: level

    Acceptance/Rejection Region

    Statistically Significant

    Test Statistic

    Critical Value

    P-value

  • 8/6/2019 LLJ110705

    47/71

    Outline

    Estimation and Hypotheses

    Continuous Outcome/Known Variance

    Test statistic, tests, p-values,confidence intervals

    Unknown Variance

    Different Outcomes/Similar TestStatistics

    Additional Information

  • 8/6/2019 LLJ110705

    48/71

    Linear regression Model for simple linear regression

    Yi= 0+ 1x1i + i0= intercept

    1

    = slope

    Assumptions

    Observations are independent

    Normally distributed with constant

    variance Hypothesis testing

    H0:1 = 0 vs. HA:1 0

  • 8/6/2019 LLJ110705

    49/71

    Confidence Interval Note

    Cannot determine if a particular intervaldoes/does not contain true mean effect

    Can say in the long run

    Take many samples

    Same sample size

    From the same population

    95% of similarly constructedconfidence intervals will contain truemean effect

    Interpret a 95% Confidence

  • 8/6/2019 LLJ110705

    50/71

    Interpret a 95% Confidence

    Interval (CI) for the population

    mean, If we were to find many such

    intervals, each from a different

    random sample but in exactly thesame fashion, then, in the long run,

    about 95% of our intervals would

    include the population mean, ,and 5% would not.

  • 8/6/2019 LLJ110705

    51/71

    How NOT to interpret a 95%

    CI There is a 95% probability that the true

    mean lies between the two confidence

    values we obtain from a particular

    sample, but we can say that we are 95%

    confident that it does lie between these

    two values.

    Overlapping CIs do NOT imply non-significance

  • 8/6/2019 LLJ110705

    52/71

    But I Have All Zeros!

    Calculate upper bound

    Known # of trials without an event (2.11van Belle 2002, Louis 1981)

    Given no observed events in n trials,

    95% upper bound on rate of occurrenceis 3 / (n + 1)

    No fatal outcomes in 20 operations

    95% upper bound on rate of

    occurrence = 3/ (20 + 1) = 0.143, sothe rate of occurrence of fatalitiescould be as high as 14.3%

  • 8/6/2019 LLJ110705

    53/71

    Analysis Follows Design

    Questions Hypotheses

    Experimental Design Samples

    Data Analyses Conclusions

  • 8/6/2019 LLJ110705

    54/71

    Which is more important, or

    ?

    Depends on the question

    Most will say protect against Type I

    error Need to think about individual and

    population health implications and

    costs

  • 8/6/2019 LLJ110705

    55/71

    Affy Gene Chip

    False negative

    Miss what could be important

    Are these samples going to belooked at again?

    False positive

    Waste resources following dead

    ends

  • 8/6/2019 LLJ110705

    56/71

    HIV Screening

    False positive

    Needless worry

    Stigma False negative

    Thinks everything is ok

    Continues to spread disease

    For cholesterol example?

  • 8/6/2019 LLJ110705

    57/71

    What do you need to think

    about?

    Is it worse to treat those who truly

    are not ill or to not treat those who

    are ill? That answer will help guide you as

    to what amount of error you are

    willing to tolerate in your trialdesign.

  • 8/6/2019 LLJ110705

    58/71

    Little Diagnostic Testing

    Lingo False Positive/False Negative

    Positive Predictive Value

    Probability diseased given POSITIVEtest result

    Negative Predictive Value

    Probability NOT diseased given

    NEGATIVE test result Predictive values depend on disease

    prevalence

  • 8/6/2019 LLJ110705

    59/71

    Outline

    Estimation and Hypotheses

    Continuous Outcome/Known Variance

    Test statistic, tests, p-values,confidence intervals

    Unknown Variance

    Different Outcomes/Similar TestStatistics

    Additional Information

  • 8/6/2019 LLJ110705

    60/71

    What you should know: CI

    Meaning/interpretation of the CI

    How to compute a CI for the true

    mean when variance is known(normal model)

    How to compute a CI for the true

    mean when the variance is NOTknown (t distribution)

  • 8/6/2019 LLJ110705

    61/71

    You Need to Know

    How to turn a question into hypotheses

    Failing to reject the null hypothesis

    DOES NOT mean that the null is true

    Every test has assumptions

    A statistician can check all the

    assumptions

    If the data does not meet the assumptions

    there are non-parametric versions of the

    tests (see text)

  • 8/6/2019 LLJ110705

    62/71

    P-value Interpretation Reminders

    Measure of the strength of

    evidence in the data that the null is

    not true

    A random variable whose value

    lies between 0 and 1

    NOT the probability that the null

    hypothesis is true.

  • 8/6/2019 LLJ110705

    63/71

    Avoid Common Mistakes:

    Hypothesis Testing

    If you have paired data, use a

    paired test

    If you dont then you can lose power If you do NOT have paired data, do

    NOT use a paired test

    You can have the wrong inference

  • 8/6/2019 LLJ110705

    64/71

    Common Mistakes:

    Hypothesis Testing

    These tests have assumptions ofindependence

    Taking multiple samples per subject ?

    Statistician MUST know Different statistical analyses MUST be

    used and they can be difficult!

    Distribution of the observations

    Histogram of the observations

    Highly skewed data - t test - incorrectresults

  • 8/6/2019 LLJ110705

    65/71

    Common Mistakes:

    Hypothesis Testing

    Assume equal variances and the

    variances are not equal

    Did not show variance test

    Not that good of a test

    ALWAYS graph your data first to

    assess symmetry and variance

    Not talking to a statistician

  • 8/6/2019 LLJ110705

    66/71

    Misconceptions

    Smaller p-value larger effect

    Effect size is determined by the

    difference in the sample mean orproportion between 2 groups

    P-value = inferential tool

    Helps demonstrate that populationmeans in two groups are not equal

  • 8/6/2019 LLJ110705

    67/71

    Misconceptions

    A small p-value means the difference is

    statistically significant, not that the

    difference is clinically significant

    A large sample size can help get a

    small p-value

    Failing to reject H0 There is not enough evidence to reject H0

    Does NOT mean H0 is true

  • 8/6/2019 LLJ110705

    68/71

    Analysis Follows Design

    Questions Hypotheses

    Experimental Design Samples

    Data Analyses Conclusions

  • 8/6/2019 LLJ110705

    69/71

    Normal/Large Sample Data?

    Binomial?

    Independent? Nonparametric test

    Expected 5

    2 sample Z test for

    proportions or

    contingency table

    McNemars test

    Fishers Exact

    test

    No

    No

    No

    No

    Yes

    Yes

    Yes

    N l/ S l D ?

  • 8/6/2019 LLJ110705

    70/71

    Normal/Large Sample Data?

    Inference on means?

    Independent? Inference on variance?

    Variance

    known?

    Paired t

    Z test

    Variances equal?

    T test w/

    pooled

    variance

    T test w/

    unequal

    variance

    F test for

    variances

    Yes

    Yes

    Yes Yes

    Yes

    Yes

    No

    No

    No

    No

  • 8/6/2019 LLJ110705

    71/71

    Questions?