ReviewChaps3-4

Embed Size (px)

Citation preview

  • 8/14/2019 ReviewChaps3-4

    1/8

    Review Chaps 3-4

    Chapter 3

    A Closer Look at Assumptions

    Robustness: a statistical procedure is robustto departures from a particular assumption ifit is valideven when the assumption is not met. Validmeans that the uncertaintymeasures, such as confidence intervals and p-values, are very nearly equal to those under

    the assumption. The t procedures assume that the data are normal, that the two

    populations have equal variances, and that all the observations are independent of one

    another.

    Departures from the Normality Assumption: The tprocedures are robust to departures

    from normality. Data depart from normality when their distribution is not symmetrical,

    bell- or mound-shaped, or when the tails are either too long or short. While this issubjective to some extent, assessing how severe is the departure from normality is an

    important part of your training.

    Departures from the Equal Variances Assumption: These departures can be moreserious. This condition is best checked by looking at histograms of both samples as well

    as the sample standard deviations. Often, so long as the sample sizes are similar the

    uncertainty measures will still be reasonably close to the true ones.

  • 8/14/2019 ReviewChaps3-4

    2/8

    Departures from Independence: This can be caused by serial or spatial correlation or by

    cluster effects. These assumptions can usually be easily checked by considering the

    experimental design and data collecting procedures. Data that fail to meet theindependence assumption require methods other than those presented here.

    What are some examples of data sets that violate the independence assumption?

    Resistance: A statistical procedure is resistant if it does not change very much when asmall part of the data changes. The t procedures are not resistant to outliers. Outliers

    should be identified and the analysis performed with and without the outlier and the

    results reported in the published results of the experiment.

  • 8/14/2019 ReviewChaps3-4

    3/8

    Transformations of the Data: sometimes departures from normality can be corrected bytransformations. The most common transformation is the log transform. There are two

    common log transformations. It is common for writers to use log to mean both the natural

    log and the log base 10. In my notes, I will often use ln to mean natural log and log tomean log 10, or log to mean either natural or log 10. Your text will use log to mean

    natural log and will use log10 to mean log 10, though the use oflog10 will be rare. Pleasedo not let this confuse or upset you.

    Multiplicative Treatment Effect under the log Transformation: If an additive effect is

    found for log transformed data, i.e.,

    log( )y x = + ,

    Then, to get back to the original scale,

    ( )log( )x

    ye e xe

    +

    = = Thus, e

    is the multiplicative effect on the original scale of measurement. To test for a

    treatment effect, we use the usual t-test on the log-transformed data.

  • 8/14/2019 ReviewChaps3-4

    4/8

    Population Model: Estimating the Ratio of Population Medians

    When drawing inferences about population means using a log or natural log transform,

    there is a problem with transforming back to the original scale of the data because themean of the log transformed sample is not the log of the mean of the original. So taking

    the antilog does not give an estimate of the mean on the original scale.

    However, if the log transformed data have a symmetric distribution then

    mean[log(Y)] = median[log(Y)]

    and

    median[log(Y)] = log[median(Y)].

    In words, The median, or 50th percentile, of the log transformed values is the log of the

    50

    th

    percentile of the original values. So, when we transform back to the original scale,we are now drawing inferences about the median.

    If we denote the averages of the log transformed means as 1 2,Z Z then the difference of

    these two quantities estimates the log of the ratio of their medians. That is,

    2 1Z Z =

    ( )

    ( )2

    1

    medianlog

    median

    Y

    Ywhere Y1 and Y2 represent the two samples and hence the

    right-hand side is an estimate of the log of the ratio of the two population medians.

    Other transformations: There are many transformations one can try. There are rules ofthumb, but it usually boils down to trial and error. Some common transformations are

    square root, reciprocal, and the arcsine.

  • 8/14/2019 ReviewChaps3-4

    5/8

  • 8/14/2019 ReviewChaps3-4

    6/8

    Wilcoxon Sum Rank Test

    Say you believe that students who go home to their families for Thanksgiving Weekend

    actually do better on their exams because they need to decompress more than they needto study. Say you took a random sample of 8 students who went home for Thanksgiving

    and 8 who stayed in Missoula and studied, and then obtained their final exam scores outof a total of 200 possible.

    Here are the resulting data

    Went Home Studied

    113.2595.94

    90.04

    104.44119.21

    106.88

    94.99131.09

    137.9134142.4956

    129.6706

    115.4934183.4077

    123.5596

    94.7618102.0240

    The hypothesis test is then:

    H0: h-s=0 versus Ha: h-s>0

    To perform the sum-rank test, we need to rank the two samples together and find the

    ranks. Once we have found the ranks, the test statistic can be calculated.

    Under the null hypothesis of no difference, we have( )( )

    ( )1 1 21 1,var

    2 12W

    n N n n N W

    + += =

    Our two samples are just large enough (each >7) so we can perform this test by

    calculating a z-score.

    ( )

    wW

    zSD W

    = .

  • 8/14/2019 ReviewChaps3-4

    7/8

    HomeRanked

    StudiedRanked

    Ranks

    90.0400

    94.990095.9400

    104.4400

    106.8800

    113.2500

    119.2100

    131.0900

    94.7618

    102.0240

    115.4934

    123.5596

    129.6706

    137.9134142.4956

    183.4077

    12

    345

    6

    78

    9

    10

    1112

    13

    1415

    16

    We now sum the ranks for the home data. This yields W=1+3+4+6+7+8+10+12=51.

    For this example,

    51 680.1992

    85.33z

    = =

    So, we fail to reject the null hypothesis of no difference between the two groups.

    Wilcoxon Signed-Rank Test for Paired Data:

    To compute the signed-rank test statistic we perform the following calculations:

    1) Compute the difference in each of the n pairs.2) Drop the zeros from the list.3) Order the absolute differences from smallest to largest and assign ranks.4) The signed rank statistic, S, is the sum of the ranks from the pairs for which the

    difference is positive.

    This procedure can be performed in SPSS under non-parametric tests2 relatedsampleschoose Wilcoxon.

  • 8/14/2019 ReviewChaps3-4

    8/8

    Exact p-value for the Signed-Rank Test: The exact p-value is the proportion of allassignments of outcomes within each pair that lead to a test statistic as least as extreme as

    the observed. In the schizophrenia example, we assign affected or not affected to

    each pair, regardless of true status, in 215

    different ways. Then calculate the statistic underthese different assignments. The distribution of the statistics calculated in this way is then

    the true distribution under the null hypothesis of no treatment effect.

    Normal approximation for the Signed-Rank Test: A normal approximation give us an

    expected value and standard deviation:

    s=n(n+1)/4

    SD(S)=[n(n+1)(2n+1)/24]1/2

    We then compare the usual z-statistic to the quantiles of the normal distribution to obtain

    a p-value in the usual way.