Post Hoc Power

  • Upload
    kellix

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

  • 8/2/2019 Post Hoc Power

    1/31

    Post Hoc Power: A Concept WhoseTime Has Come

    Anthony J. Onwuegbuzie

    Department of Educational Measurement and ResearchUniversity of South Florida

    Nancy L. Leech

    Department of Educational Psychology

    University of Colorado at Denver and Health Sciences

    Thisarticle advocates the use ofposthoc power analyses. First, reasons for the nonuse

    ofa prioripower analysesare presented.Next, posthoc power is defined and itsutility

    delineated. Third, a step-by-step guide is provided for conducting post hoc power

    analyses. Fourth, a heuristic example is provided to illustrate how post hoc power can

    help to rule in/out rival explanations in the presence of statistically nonsignificant

    findings.Finally, several methods areoutlined that describe howpost hocpoweranal-

    yses can be used to improve the design of independent replications.

    power, a prior power, post hoc power, significance testing, effect size

    For more than 75 years, nullhypothesis significance testing (NHST)hasdominated

    the quantitative paradigm stemming from the seminal works of Fisher (1925/1941)

    and Neyman and Pearson (1928). NHST was designed to provide a means of ruling

    out a chance finding, thereby reducing the chance of falsely rejecting the null hy-

    pothesis in favor of the alternative hypothesis (i.e., committing a Type I error).

    Although NHST has permeated the behavioral and social science field since its

    inception, its practice has been subjected to severe criticism with the number of

    critics growing throughout the years. The most consistent criticism of NHST that

    has emerged is that statistical significance is not synonymous with practical im-portance. More specifically, statistical significance does not provide any informa-

    UNDERSTANDING STATISTICS, 3(4), 201230Copyright 2004, Lawrence Erlbaum Associates, Inc.

  • 8/2/2019 Post Hoc Power

    2/31

    tion about how important or meaningful an observed finding is (e.g., Bakan, 1966;

    Cahan, 2000; Carver, 1978, 1993; Cohen, 1994, 1997; Guttman, 1985; Loftus,

    1996; Meehl, 1967, 1978; Nix & Barnette, 1998; Onwuegbuzie & Daniel, 2003;Rozeboom, 1960; Schmidt, 1992; 1996; Schmidt & Hunter, 1997).

    As a result of this limitation of NHST, some researchers (e.g., Carver, 1993)

    contend that effect sizes, which represent measures of practical importance,

    should replace statistical significance testing completely. However, reporting and

    interpreting only effect sizes could lead to the overinterpretation of a finding

    (Onwuegbuzie & Levin, 2003; Onwuegbuzie, Levin, & Leech, 2003). As noted by

    Robinson and Levin (1997)

    Although effect sizes speak loads about the magnitude of a difference or relationship,they are, in and of themselves, silent with respect to the probability that the estimated

    difference or relationship is due to chance(samplingerror). Permitting authors to pro-

    mote and publish seemingly interesting or unusual outcomes when it can be doc-

    umented that such outcomes are not really that unusual would open the publication

    floodgates to chance occurrences and other strange phenomena. (p. 25)

    Moreover, interpreting effect sizes in the absence of a NHSTcould adversely affect

    conclusion validity just as interpreting p values without interpreting effect sizes

    could lead to conclusion invalidity (Onwuegbuzie, 2001). Table 1 illustrates thefour possible outcomes and their conclusion validity when combiningp values and

    effect sizes. This table echoes the decision table that demonstrates the relationship

    between Type I error and Type II error. It can be seen from Table 1 that conclusion

    validity occurs when (a) both statistical significance (e.g.,p < .05)and a large effect

    size prevail or (b) both statistical nonsignificance (e.g., p > .05) and a small effect

    size occur. Conversely, conclusion invalidity exists if (a) statistical

    nonsignificance (e.g., p > .05) is combined with a large effect size (i.e., Type A er-

    ror) and (b) statistical significance (e.g., p < .05) is combined with a small effect

    size (i.e., Type B error). Both Type A and Type B errors suggest that any declara-tion that the observed finding is meaningful on the part of the researcher would be

    misleading and hence result in conclusion invalidity (Onwuegbuzie, 2001). In this

    respect, conclusion validity is similar to what Levin and Robinson (2000) termed

    conclusion coherence. Levin and Robinson (2000) defined conclusion coher-

    ence as consistency between the hypothesis-testing and estimation phases of the

    decision-oriented empirical study (p. 35).

    Therefore, as can be seen in Table 1, always interpreting effect sizes regardless

    of whether statistical significance is observed may lead to a Type A or Type B er-

    ror being committed. Similarly, Type A and Type B errors also prevail whenNHST is used by itself. In fact, conclusion validity is maximized only if bothp val-

    ues and effect sizes are utilized

    202 ONWUEGBUZIE AND LEECH

  • 8/2/2019 Post Hoc Power

    3/31

    a two-step process for making statistical inferences. According to this model, a

    statistically significant observed finding is followed by the reporting and interpret-ing of one or more indexes of practical importance; however, no effect sizes are re-

    ported in light of a statistically nonsignificant finding. In other words, analysts

    should determine first whether the observed result is statistically significant (Step

    1), and if and only if statistical significance is found, then they should report how

    large or important the observed finding is (Step 2). In this way, the statistical sig-

    nificance test in Step 1 serves as a gatekeeper for the reporting and interpreting

    of effect sizes in Step 2.

    This two-step process is indirectly endorsed by the latest edition of the Publica-

    tion Manual of the American Psychological Association (American PsychologicalAssociation [APA], 2001):

    When reporting inferential statistics (e.g., ttests, Ftests, and chi-square), include

    [italics added] information about the obtained magnitude or value of the test statistic,

    the degrees of freedom, the probability of obtaining a value as extreme as or more ex-

    treme than the one obtained, and the direction of the effect. Be sure to include [italics

    added]sufficient descriptive statistics (e.g., per-cell samplesize, means, correlations,

    standard deviations) so that the nature of the effect being reported can be understood

    by the reader and for future meta-analyses. (p. 22)

    Three pages later, the APA (2001) stated

    Neither of the two types of probability value directly reflects the magnitude of an ef-

    fect or the strength of a relationship. For the reader to fully understand the importance

    of your findings, it is almost always necessary to include some index of effect size or

    strength of relationship in your Results section. (p. 25)

    On the following page, APA (2001) stated that

    Th l i i l t b f ll d h i t id th d t l ith

    POST HOC POWER 203

    TABLE 1

    Possible Outcomes and Conclusion Validity When Null Hypothesis Significance Testing

    and Effect Sizes Are Combined

    Effect Size

    Large Small

    Not statistically significant p value Type A error Conclusion validity

    Statistically significant Conclusion validity Type B error

  • 8/2/2019 Post Hoc Power

    4/31

    Most recently, Onwuegbuzie and Levin (2003) proposed a three-step procedure

    when two or more hypothesis tests are conducted within the same study, which in-

    volves testing the trend of the set of hypotheses at the third step. Using either thetwo-step method or the three-step method helps to reduce not only the probability

    of committing a Type A error but also the probability of committing a Type B er-

    ror, namely, declaring as important a statistically significant finding with a small

    effect size (Onwuegbuzie, 2001). However, whereas Type B error almost certainly

    will be reduced by using one of these methods compared to using NHST alone, the

    reduction in the probability of Type A error is not guaranteed using these proce-

    dures. This is because if statistical power is lacking, then the first step of the

    two-step method and the first and third steps of the three-step procedure, which

    serve as gatekeepers for computing effect sizes, may lead to the nonreporting of anontrivial effect (i.e., Type A error). Simply put, sample sizes that are too small in-

    crease the probability of a Type II error (not rejecting a false null hypothesis) and

    subsequently increase the probably of committing a Type A error.

    Clearly, both error probabilities can be reduced if researchers conduct a priori

    power analyses to select appropriate sample sizes. However, unfortunately, such

    analyses are rarely employed (Cohen, 1992; Keselman et al., 1998; Onwuegbuzie,

    2002). When a priori power analyses have been omitted, researchers should con-

    duct post hoc power analyses, especially for nonstatistically significant findings.

    This would help researchers determine whether low power threatens the internalvalidity of findings (i.e., Type A error). Yet, researchers typically have not used

    this technique.

    Thus, in this article, we advocate the use of post hoc power analyses. First, we

    present reasons for the nonuse of a priori power analyses. Next, we define post hoc

    power and delineate its utility. Third, we provide a step-by-step guide for conduct-

    ing post hoc power analyses. Fourth, we provide a heuristic example to illustrate

    how posthoc power can help to rule in/out rival explanations in the presence of sta-

    tistically nonsignificant findings. Finally, we outline several methods that describe

    how post hoc power analyses can be used to improve the design of independentreplications.

    DEFINITION OF STATISTICAL POWER

    Neyman and Pearson (1933a,1933b) were the first todiscuss the concepts of TypeI

    and Type II error. Type I error occurs when the researcher rejects the null hypothe-

    sis when it is true.Asnotedpreviously, the TypeI error probability isdetermined by

    the significance level (). For example, if a 5% level of significance is designated,then the Type I error rate is 5%. Stated another way, represents the conditional

    204 ONWUEGBUZIE AND LEECH

  • 8/2/2019 Post Hoc Power

    5/31

    are made over repeated samples from the same population under the same null and

    alternative hypothesis, assuming thenull hypothesis is true. Conversely, Type II er-

    ror occurs when the analyst accepts the null hypothesis when the alternative hy-pothesis is true. The conditional probability of making a Type II error under the al-

    ternative hypothesis is denoted by .Statistical power is the conditional probability of rejecting the null hypothesis

    (i.e., accepting the alternative hypothesis) when the alternative hypothesis is true.

    The most common definition of power comes from Cohen (1988), who defined the

    power of a statistical testas the probability [assuming the null hypothesis is false]

    that it will lead to the rejection of the null hypothesis, i.e., the probability that it

    will result in the conclusion that the phenomenon exists (p. 4). Power can be

    viewed as how likely it is that the researcher will find a relationship or differencethat really prevails. It is given by 1 .

    Statistical power estimates are affected by three factors.1 The first factor is level

    of significance. Holding all other aspects constant, increasing the level of signifi-

    cance increases power but also increases the probability of rejecting the null hy-

    pothesis when it is true. The second influential factor is the effect size.

    Specifically, the larger the difference between the value of the parameter under the

    null hypothesis and the parameter under the alternative parameter, the greater the

    power to detect it. The third instrumental component is the sample size. The larger

    the sample size, the greater the likelihood of rejecting the null hypothesis2 (Chase& Tucker, 1976; Cohen, 1965, 1969, 1988, 1992). However, caution is needed due

    to the fact that large sample sizes can increase power to the point of rejecting the

    null hypothesis when the associated observed finding is not practically important.

    Cohen (1965), in accordance with McNemar (1960), recommended a probabil-

    ity of .80 or greater for correctly rejecting the null hypothesis representing a me-

    dium effect at the 5% level of significance. This recommendation was based on

    considering the ratio of the probability of committing a Type I error (i.e., 5%) to

    the probability of committing a Type II error (i.e., 1 .80 = .20). In this case, the ra-

    tio was 1:4, reflecting the contention that Type I errors are generally more seriousthan are Type II errors.

    Power, levelof significance,effect size,andsamplesizeare relatedsuchthat any

    oneofthesecomponentsisafunctionoftheotherthreecomponents.AsnotedbyCo-

    hen (1988), whenanythreeof themare fixed, thefourth iscompletely determined

    (p. 14).Thus, there are fourpossible types ofpower analyses inwhich one of the pa-

    rametersisdeterminedasafunctionoftheotherthreeasfollows:(a)powerasafunc-

    tionof levelof significance, effectsize,andsample size; (b) effectsize as a function

    POST HOC POWER 205

    1Recently, Onwuegbuzie and Daniel (2004) demonstrated empirically that power also is affected

    by score reliability Specifically Onwuegbuzie and Daniel showed that the higher the score reliability

  • 8/2/2019 Post Hoc Power

    6/31

    of level of significance, sample size, and power; (c) level of significance as a func-

    tion of sample size, effect size, and power; and (d) samplesize as a function of level

    ofsignificance,effect size,and power (Cohen,1965,1988). The latter typeofpoweranalysis is the most popular and most useful for planning research studies (Cohen,

    1992).Thisformofpoweranalysis,whichiscalledana prioripoweranalysis, helps

    the researcher to ascertain the sample size necessary to obtain a desired level of

    power for a specified effect size and levelof significance. Conventionally, most re-

    searchers set the power coefficient at .80 and the level of significance at .05. Thus,

    once the expected effect size andtype of analysis are specified, then the sample size

    needed to meet all specifications can be determined.

    The value of an a priori power analysis is that it helps the researcher in planning

    research studies (Sherron, 1988). By conducting such an analysis, researchers putthemselves in the position to select a sample size that is large enough to lead to a

    rejection of the null hypothesis for a given effect size. Alternatively stated, a priori

    power analyses help researchers to obtain the necessary sample sizes to reach a de-

    cision with adequate power. Indeed, the optimum time to conduct a power analysis

    is during the research design phase (Wooley & Dawson, 1983).

    Failing to consider statistical power can have dire consequences for research-

    ers. First and foremost, low statistical power reduces the probability of rejecting

    the null hypothesis and therefore increases the probability of committing a Type II

    error (Bakan, 1966; Cohen, 1988). It may also increase the probability of commit-ting a Type I error (Overall, 1969), yield misleading results in power studies

    (Chase & Tucker, 1976), and prevent potentially important studies from being

    published as a result of publication bias (Greenwald, 1975) that has been called the

    file-drawer problem representing the tendency to keep statistically

    nonsignificant results in file drawers (Rosenthal, 1979).

    It has been exactly 40 years since Jacob Cohen (1962) conducted the first sur-

    vey of power. In this seminal work, Cohen assessed the power of studies published

    in the abnormal-social psychology literature. Using the reported sample size and a

    nondirectional significance level of 5%, Cohen calculated the average power todetect a hypothesized effect (i.e., hypothesized power) across the 70 selected stud-

    ies for nine frequently used statistical tests using small, medium, and large esti-

    mated effect size values. The average power of the 2,088 major statistical tests

    were .18, .48, and .83 for detecting a small, medium, and large effect size, respec-

    tively. The average hypothesized statistical power of .48 for medium effects indi-

    cated that studies in the abnormal psychology field had, on average, less than a

    50% chance of correctly rejecting the null hypothesis (Brewer, 1972; Halpin &

    Easterday, 1999).

    During the next three decades after Cohens (1962) investigation, several re-searchers have conducted hypothetical power surveys across a myriad of disci-

    206 ONWUEGBUZIE AND LEECH

  • 8/2/2019 Post Hoc Power

    7/31

    Owen, 1973), communication (Chase & Tucker, 1975; Katzer & Sodt, 1973),

    communication disorders (Kroll & Chase, 1975), mass communication (Chase &

    Baran, 1976), counselor education (Haase, 1974), social work education (Orme &Tolman, 1986), science education (Penick & Brewer, 1972; Wooley & Dawson,

    1983), English education (Daly & Hexamer, 1983), gerontology (Levenson,

    1980), marketing research (Sawyer & Ball, 1981), and mathematics education

    (Halpin & Easterday, 1999). The average hypothetical power of these 15 studies

    was .24, .63, and .85 for small, medium, and large effects, respectively. Assuming

    that a medium effect size is appropriate for use in most studies because of its com-

    bination of being practically meaningful and realistic (Cohen, 1965; Cooper &

    Findley, 1982; Haase, Waechter, & Solomon, 1982), the average power of .63

    across these studies is disturbing. Similarly disturbing is the average hypothesizedpower of .64 for a medium effect reported by Rossi (1990) across 25 power sur-

    veys involving more than 1,500 journal articles and 40,000 statistical tests.

    An even more alarming picture is painted by Schmidt and Hunter (1997) who

    reported that the average [hypothesized] power of null hypothesis significance

    tests in typical studies and research literature is in the .40 to .60 range (Cohen,

    1962, 1965, 1988, 1992; Schmidt, 1996; Schmidt, Hunter, & Urry, 1976;

    Sedlmeier & Gigerenzer, 1989) [with] .50 as a rough average (p. 40). Unfortu-

    nately, an average hypothetical power of .5 indicates that more than one half of all

    statistical tests in the social and behavioral science literature will be statisticallynonsignificant. As noted by Schmidt and Hunter (1997), This level of accuracy is

    so low that it could be achieved just by flipping a (unbiased) coin! (p. 40). Yet, the

    fact that power is unacceptably low in most studies suggests that misuse of NHST

    is to blame, not the logic of NHST. Moreover, the publication bias that prevails in

    research suggests that the hypothetical power estimates provided previously likely

    represent an upper bound. Thus, as declared by Rossi (1997), it is possible that at

    least some controversies in the social and behavioral sciences may be artifactual in

    nature (p. 178). Indeed, it canbe argued that low statistical power represents more

    of a research design issue than it is a statistical issue because it can be rectified byusing a larger sample.

    Bearing in mind the importance of conducting statistical power analyses, it is

    extremely surprising that very few researchers conduct and report power analyses

    for their studies (Brewer, 1972; Cohen, 1962, 1965, 1988, 1992; Keselman et al.,

    1998; Onwuegbuzie, 2002; Sherron, 1988) even though statistical power has been

    promoted actively since the 1960s (Cohen, 1962, 1965, 1969) and even though for

    many types of statistical analyses (e.g., r, z, F, 2), tables have been provided byCohen (1988, 1992) to determine the necessary sample size. Even when a priori

    power has been calculated, it is rarely reported (Wooley & Dawson, 1983). Thislack of power analyses still prevails despite the recommendations of the APA

    POST HOC POWER 207

  • 8/2/2019 Post Hoc Power

    8/31

    The lack of use of power analysis might be the result of one or more of the fol-

    lowing factors. First and foremost, evidence exists that statistical power is not suf-

    ficiently understood by researchers (Cohen, 1988, 1992). Second, it appears thatthe concept and applications of power are not taught in many undergraduate- and

    graduate-level statistical courses. Moreover, when power is taught, it is likely that

    inadequate coverage is given. Disturbingly, Mundfrom, Shaw, Thomas, Young,

    and Moore (1998) reported that the issue of statistical power is regarded by in-

    structors of research methodology, statistics, and measurement as being only the

    34th most important topic in their fields out of the 39 topics presented. Also in the

    Mundfrom et al. study, power received the same low ranking with respect to cover-

    age in the instructors classes. Clearly, if power is not being given a high status in

    quantitative-based research courses, then students similarly will not take it seri-ously. In any case, these students will not be suitably equipped to conduct such

    analyses.

    Another reason for the spasmodic use of statistical power possibly stems from

    the incongruency between endorsement and practice. For instance, although APA

    (2001) stipulated that power analyses be conducted despite providing several

    NHST examples, the manual does not provide any examples of how to report sta-

    tistical power (Fidler, 2002). Harris (1997) also provided an additional rationale

    for the lack of power analyses:

    I suspect that this low rate ofuse ofpower analysis is largely due to the lackofpropor-

    tionality between the effort required to learn and execute power analyses (e.g., deal-

    ing with noncentral distributions or learning the appropriate effect-size measure with

    which to enter the power tables in a given chapter of Cohen, 1977) and the low payoff

    from such an analysis (e.g., the high probability that resource constraints will force

    you to settle for a lower Nthan your power analysis says you should have)espe-

    ciallygiven theuncertainties involved in a prioriestimatesof effectsizesandstandard

    deviations, which render the resulting power calculation rather suspect. If calculation

    of thesamplesize neededforadequate powerandforchoosingbetween alternative in-

    terpretations of a nonsignificant result could be made more nearly equal in difficulty

    to the effort weve grown accustomed to putting into significance testing itself, more

    of usmight in fact carryout these preliminaryandsupplementary analyses.(p.165)

    A further reason why a priori power analyses are not conducted likely stems

    from the fact that the most commonly used statistical packages, such as the Statis-

    tical Package for the Social Sciences (SPSS; SPSS Inc., 2001) and the Statistical

    Analysis System (SAS Institute Inc., 2002), do not allow researchers directly to

    conduct power analyses. Furthermore, the statistical software programs that con-

    duct power analyses (e.g., Erdfelder, Faul, & Buchner, 1996; Morse, 2001), al-though extremely useful, typically do not conduct other types of analyses, and

    208 ONWUEGBUZIE AND LEECH

  • 8/2/2019 Post Hoc Power

    9/31

    sive. Even when researchers have power software in their possession, the lack of

    information regarding components needed to calculate power (e.g., effect size,

    variance) serves as an additional impediment to a priori power analyses.It is likely that the lack of power analyses coupled with a publication bias pro-

    mulgates the publishing of findings that are statistically significant but have small

    effect sizes (possibly increasing Type B error) as well as leading researchers to

    eliminate valuable hypotheses (Halpin & Easterday, 1999). Thus, we recommend

    in the strongest possible manner that all quantitative researchers conduct a priori

    power analyses whenever possible. These analyses should be reported in the

    Method section of research reports. This report also should include a rationale for

    criteria used for all input variables (i.e., power, significance level, effect size;

    APA, 2001; Cohen, 1973, 1988). Inclusion of such analyses will help researchersto make optimum choices on the components (e.g., sample size, number of vari-

    ables studied) needed to design a trustworthy study.

    POST HOC POWER ANALYSES

    Whether an a priori power analysis is undertaken and reported, problems can still

    arise. One problem that commonlyoccurs in educational research is when thestudy

    is completed anda statistically nonsignificant result is found. In many cases, the re-

    searcher then disregards the study (i.e., file-drawer problem) or when he or she sub-mits the final report to a journal for review, finds it is rejected (i.e., publication

    bias). Unfortunately, most researchers do not determine whether the statistically

    nonsignificant result is the result of insufficient statistical power. That is, without

    knowing the power of the statistical test, it is not possible to rule in or rule out low

    statistical power as a threat to internal validity (Onwuegbuzie, 2003). Nor can an a

    priori power analysis necessarily rule in/out this threat. This is because a priori

    power analyses involve the use of a priori estimates of effect sizes and standard de-

    viations (Harris, 1997). As such, a priori power analyses do not represent thepower

    to detect the observed effect of the ensuing study; rather, they represent the powerto detect hypothesized effects. Before the study is conducted, researchers do not

    knowwhat the observed effect size will be. All theycan do is try toestimate itbased

    on previous research and theory (Wilkinson & the Task Force on Statistical Infer-

    ence, 1999). The observed effect size could end up being much smaller or much

    larger than the hypothesized effect size on which the power analysis is undertaken.

    (Indeed, this is a criticism of the power surveys highlighted previously; Mulaik,

    Raju, & Harshman, 1997.) In particular, if the observed effect size is smaller than

    what is proposed, the sample size yielded by the a priori power analysis might be

    smaller than is needed to detect it. In other words, a smaller effect size than antici-pated increases the chances of Type II error.

    POST HOC POWER 209

  • 8/2/2019 Post Hoc Power

    10/31

    gate the performance of an NHST (Mulaik et al., 1997; Schmidt, 1996; Sherron,

    1988). Such a technique leads to what is often called a post hoc power analysis. In-

    terestingly, several authors have recommended the use of post hoc power analysesfor statistically nonsignificant findings (Cohen, 1969; Dayton, Schafer, & Rogers,

    1973; Fagely, 1985; Fagley & McKinney, 1983; Sawyer & Ball, 1981; Wooley &

    Dawson, 1983).

    When post hoc power should be reported has been the subject of debate. Al-

    though some researchers advocate that post hoc power always be reported (e.g.,

    Wooley & Dawson, 1983), the majority of researchers advocate reporting post hoc

    power only for statistically nonsignificant results (Cohen, 1965; Fagely, 1985;

    Fagley & McKinney, 1983; Sawyer & Ball, 1981). However, both sets of analysts

    have agreed that estimating the power of significance tests that yield statisticallynonsignificant findings plays an important role in their interpretations (e.g.,

    Fagely, 1985; Fagley & McKinney, 1983; Sawyer & Ball, 1981; Tversky &

    Kahneman, 1971). Specifically, statistically nonsignificant results in a study with

    low power suggest ambiguity. Conversely, statistically nonsignificant results in a

    study with high power contribute to the body of knowledge because power can be

    ruled out as a threat to internal validity (e.g., Fagely, 1985; Fagley & McKinney,

    1983;Sawyer & Ball, 1981; Tversky & Kahneman, 1971). To this end, statistically

    nonsignificant results can make a greater contribution to the research community

    than they presently do. As noted by Fagely (1985), Just as rejecting the null doesnot guarantee large and meaningful effects, accepting the null does not preclude

    interpretable results (p. 392).

    Conveniently, post hoc power analyses can be conducted relatively easily be-

    cause some of the major statistical software programs compute post hoc power es-

    timates. In fact, post hoc power coefficients are available in SPSS for the general

    linear model (GLM). Post hoc power (or observed power, as it is called in the

    SPSS output) is based on taking the observed effect size as the assumed popula-

    tion effect, which produces a positively biased but consistent estimate of the ef-

    fect (D. Nichols,3 personal communication, November 4, 2002). For example, thepost hoc power procedure for analyses of variance (ANOVAs) and multiple

    ANOVAs is contained within the options button.4 It should be noted that due to

    sampling error, the observed effect size used to compute the post hoc power esti-

    mate might be very different than the true (population) effect size, culminating in a

    misleading evaluation of power.

    210 ONWUEGBUZIE AND LEECH

    3

    David Nichols is a Principal Support Statistician and Manager of Statistical Support, SPSS Tech-nical Support.

    4Details of the post hoc power computations in the statistical algorithms for the GLM and

  • 8/2/2019 Post Hoc Power

    11/31

    PROCEDURE FOR CONDUCTING POST HOC

    POWER ANALYSES

    Although post hoc power analysis procedures exist for multivariate tests of statisti-

    cal significance such as multivariate analysis of variance and multivariate analysis

    of covariance (cf. Cohen, 1988), these methods are sufficiently complex to warrant

    use of statistical software. Thus, for this article, we restrict our attention to post hoc

    power analysis procedures for univariate tests of statistical significance. What fol-

    lows is a description of how to undertake a post hoc power analysis by hand for

    univariate tests, that is, for situations in which there is one quantitative dependent

    variable and one or more independent variables. Here, we use Cohens (1988)

    framework for conducting post hoc power for multiple regression and correlationanalysis. Furthermore, these steps can also be used for a priori power analyses by

    substituting in a hypothesized effect size value instead of an observed effect size

    value. As noted by Cohen (1988), the advantage of this framework is that it can be

    used for any type of relationship between the independent variable(s) and depend-

    ent variable (e.g., linear or curvilinear, whole or partial,general or partial). Further-

    more, the independent variables can be quantitative or qualitative (i.e., nominal

    scale), direct variates or covariates, and main effects or interactions. Also, the inde-

    pendent variables can represent ex post facto research or retrospective research and

    canbe naturally occurring or theresults of experimental manipulation.Most impor-tant, however, because multiple regression subsumes all other univariate statistical

    techniques as special cases (Cohen, 1968), this post hoc power analysis framework

    can be used for other members of the univariate GLM, including correlation analy-

    sis, ttests, ANOVA, and analysis of covariance.

    The Ftest can be used for all univariate members of the GLM to test the null hy-

    pothesis that the proportion of variance in the dependent variable that is accounted

    for by one or more independent variables (PVi) is zero in the population. This Ftest

    can be expressed as

    (1)

    where PVi is the proportion of the variance in the dependent variable explained by

    thesetof independentvariables,PVe is the proportion of error variance, dfi isthede-

    grees of freedom for the numerator (i.e., the numberof independent variables in the

    set), and dfe is degrees of freedom for the denominator (i.e., degrees of freedom for

    the error variance). Equation 1 can be rewritten as

    POST HOC POWER 211

    ,=

    i

    i

    e

    e

    PVdf

    FPV

    df

  • 8/2/2019 Post Hoc Power

    12/31

    The left-hand term in Equation 2, which represents the proportion of variance in the

    dependent variable that is accounted for by the set of independent variables, is a

    measure of the effect size (ES) in the sample. The right-hand term in Equation 2,namely, the study size or sensitivity of the statistical test, provides information

    about the samplesizeandnumber of independent variables. Thus, degree ofstatisti-

    cal significance is a function of the product of the effect size and the study size (Co-

    hen, 1988).

    Because most theoretically selected independent variables have a nonzero rela-

    tionship (i.e., nonzero effect size) with dependent variables, the distribution ofF

    values in any particular investigation likely takes the form of a noncentral Fdistri-

    bution (Murphy & Myors, 1998). Thus, the (post hoc) power of a statistical signifi-

    cance test can be defined as the proportion of the noncentral distribution thatexceeds the critical value used to define statistical significance (Murphy &

    Myors, 1998, p. 24). The shape of the noncentral Fdistribution as well as its range

    are a function of both the noncentrality parameter and the respective degrees of

    freedom (i.e., dfi and dfe). The noncentrality parameter, which is an index that de-

    termines the extent to which null hypothesis is false, is given by

    = ES(dfi + dfe + 1), (3)

    where ES, the effect size, is given by

    (4)

    and dfi and dfe are thenumerator anddenominatordegrees of freedom, respectively.

    Once the noncentrality parameter, , has been calculated, Table 2 or Table 3, de-pending on the level of, can be used to determine the post hoc power of the statis-tical test. Specifically, these tables are entered for values of, , dfi, and dfe.

    Step 1: Statistical Significance Criterion ()

    Table 2 represents = .05, and Table 3 represents = .01. (Interpolation and ex-trapolation techniques can be used for values between .01 and .05 and outsidethese limits, respectively.)

    Step 2: The Noncentrality Parameter ()

    InTables 2 and 3,power values are presented for 15values.Because is a contin-uous variable, linear interpolation typically is needed. According to Cohen (1988),

    212 ONWUEGBUZIE AND LEECH

    ,i

    e

    PVES

    PV=

  • 8/2/2019 Post Hoc Power

    13/31

    POST HOC POWER 213

    TABLE 2

    Power of the FTest As a Function of , dfi, and dfe

    dfi dfe 2 4 6 8 10 12 14 16 18 20 24 28 32 36 40

    120 27 48 64 77 85 91 95 97 98 99 *

    60 29 50 67 79 88 92 96 98 99 99 *

    120 29 51 68 80 88 93 96 98 99 99 *

    29 52 69 81 89 93 96 98 99 99 *

    2 20 20 36 52 65 75 83 88 92 95 97 99 *

    60 22 40 56 69 79 87 91 95 97 98 *

    120 22 41 57 71 80 87 92 95 97 98 *

    23 42 58 72 82 88 93 96 97 99 *

    3 20 17 30 44 56 67 75 82 87 91 94 97 99 *

    60 19 34 49 62 73 81 87 92 95 97 98 *

    120 19 35 50 64 75 83 89 93 95 97 99 *

    19 36 52 65 76 84 90 93 96 98 99 *

    4 20 15 26 38 49 60 69 76 83 87 91 95 98 99 *

    60 17 30 44 57 68 77 83 89 92 95 98 99 *

    120 17 31 46 58 70 78 85 90 93 96 98 99 *

    17 32 47 60 72 80 87 91 94 96 99 *

    5 20 13 23 34 44 54 63 71 78 83 87 93 96 98 99 *

    60 15 27 40 52 63 72 80 86 90 93 97 99 *120 16 29 41 54 65 75 82 87 91 94 98 99 *

    16 29 43 56 68 77 84 89 93 95 98 99 *

    6 20 12 21 30 40 50 59 66 73 79 84 91 95 97 99 *

    60 14 25 37 48 59 68 76 83 87 91 96 98 99 *

    120 14 27 39 50 62 71 79 85 89 93 97 99 99 *

    15 27 40 53 64 74 81 87 91 94 97 99 *

    7 20 11 19 28 37 46 54 62 69 75 80 88 93 96 98 99

    60 17 24 35 45 56 65 73 80 85 89 94 97 99 99 *

    120 13 25 37 47 59 68 76 82 87 91 96 98 99 *

    14 25 38 50 61 71 79 85 89 93 97 99 99 *

    8 20 10 18 26 34 42 50 58 65 71 76 85 91 94 97 9860 12 23 33 43 52 62 70 77 83 87 93 97 98 99 *

    120 12 24 35 45 55 65 73 80 85 89 95 98 99 *

    13 24 36 48 59 68 77 83 88 92 96 99 99 *

    9 20 10 17 24 32 39 47 54 61 68 73 82 88 93 96 97

    60 11 21 31 41 50 58 67 74 80 85 92 96 98 99 *

    120 11 22 33 44 53 62 71 78 83 88 94 97 99 *

    13 23 34 45 56 66 74 81 86 90 95 98 99 *

    10 20 09 16 23 30 37 44 51 58 64 70 79 86 91 94 96

    60 10 20 30 39 48 56 65 72 78 83 90 95 97 99 99

    120 11 21 31 41 51 60 69 75 81 86 93 96 98 99 *

    12 21 32 43 54 64 72 79 85 89 94 98 99 *

    (continued)

  • 8/2/2019 Post Hoc Power

    14/31

    214 ONWUEGBUZIE AND LEECH

    TABLE 2 (Continued)

    dfi dfe 2 4 6 8 10 12 14 16 18 20 24 28 32 36 40

    11 20 09 15 21 27 34 41 48 55 61 67 76 84 89 93 96

    60 11 18 27 36 45 54 62 70 76 81 89 94 97 98 99

    120 11 19 29 39 49 58 67 74 80 85 91 96 98 99 *

    12 21 31 42 52 62 70 78 83 89 94 97 99 99 *

    12 20 08 14 20 27 33 39 45 52 58 64 74 81 87 91 94

    60 09 18 28 36 44 52 59 67 73 79 87 93 96 98 99

    120 10 20 30 40 48 56 63 72 78 83 90 95 97 99 99

    11 20 30 40 50 60 69 76 82 87 93 97 98 99 *

    13 20 09 13 19 24 30 37 43 49 55 61 71 79 85 90 93

    60 10 16 24 33 41 50 58 65 71 77 86 92 95 97 99

    120 10 18 26 36 45 54 62 70 76 81 89 94 97 98 99

    11 19 29 40 50 59 67 75 81 85 92 96 98 99 *

    14 20 09 13 18 23 29 35 41 47 52 58 68 76 83 88 92

    60 10 16 23 31 40 48 56 63 69 75 84 90 94 97 98

    120 10 17 25 34 43 52 61 68 74 80 88 93 97 98 99

    11 19 28 38 48 58 65 73 79 84 91 96 98 99 *

    15 20 08 12 17 22 27 33 39 44 50 55 65 74 81 86 90

    60 09 15 22 30 38 46 54 61 67 73 73 89 94 96 98

    120 10 16 24 33 42 51 59 66 73 78 78 92 96 98 99 10 18 27 37 47 56 64 72 78 83 83 95 97 99 99

    18 20 08 11 15 19 24 29 34 39 44 49 58 67 74 80 85

    60 09 14 20 27 34 41 48 55 62 68 78 85 91 94 97

    120 09 15 22 30 38 46 54 61 68 74 83 90 94 97 98

    10 17 25 34 43 52 60 68 76 80 88 93 97 98 99

    20 20 08 11 14 18 22 26 31 36 40 45 54 63 70 77 82

    60 08 13 19 25 31 38 45 52 58 64 75 83 89 93 96

    120 09 14 21 28 36 43 51 58 65 71 81 88 93 96 98

    09 16 24 32 41 50 58 65 72 78 87 92 96 98 99

    24 20 07 10 13 16 19 23 27 31 35 39 47 55 62 69 75

    60 08 12 17 22 28 34 40 46 52 58 69 78 84 89 93120 08 13 19 25 32 39 46 53 60 66 76 84 90 94 96

    09 15 21 29 37 46 54 61 68 74 83 90 94 97 98

    30 20 07 09 11 14 16 19 22 25 29 32 39 45 53 59 65

    60 08 11 15 19 24 29 34 40 45 51 61 70 78 84 89

    120 08 12 16 22 28 34 40 46 53 59 69 78 85 90 94

    08 13 19 26 33 41 48 56 62 69 79 87 92 95 97

    40 20 07 08 10 11 13 15 18 20 22 25 30 35 41 47 52

    60 07 10 12 16 19 23 27 32 36 41 50 59 67 74 80

    120 07 10 14 18 23 28 33 38 44 49 60 69 77 83 88

    08 12 17 22 29 35 42 49 55 61 72 81 87 92 95

    48 20 07 08 09 10 12 14 15 17 19 21 25 30 35 39 44

    60 07 09 11 14 17 20 24 28 31 35 44 52 59 67 73

  • 8/2/2019 Post Hoc Power

    15/31

    POST HOC POWER 215

    TABLE 2 (Continued)

    dfi dfe 2 4 6 8 10 12 14 16 18 20 24 28 32 36 40

    60 20 07 08 09 10 12 13 14 16 18 19 23 26 30 34 38

    60 07 08 10 12 15 17 20 23 26 29 36 43 50 57 63

    120 07 09 11 14 17 21 25 28 33 37 46 54 62 70 76

    07 10 14 18 23 28 34 39 45 01 62 71 79 85 70

    120 20 06 07 08 08 09 10 10 11 12 12 14 15 17 19 20

    60 06 07 08 09 10 11 12 13 15 16 19 23 26 30 34

    120 06 07 08 10 11 13 15 17 19 21 26 31 36 41 47

    06 08 11 13 16 19 23 24 31 35 43 52 60 68 74

    Note. = .05. From Statistical Power Analysis for the Behavioral Sciences (2nd ed., pp. 420423),

    by J. Cohen, 1988, Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Reprinted with permission.

    *Power values here and to the right are > .995.

    TABLE 3

    Power of the FTest As a Function of , dfi, and dfe

    dfi dfe 2 4 6 8 10 12 14 16 18 20 24 28 32 36 40

    1 20 10 23 37 51 63 73 80 86 90 94 97 99 *

    60 12 26 42 57 69 79 86 91 94 96 99 *

    120 12 27 44 58 71 80 87 92 95 97 99 *

    12 28 45 60 72 81 88 92 95 97 99 *

    2 20 06 15 26 37 48 58 67 75 81 86 93 96 98 99

    60 08 18 30 45 57 68 76 83 88 92 97 99 99 *

    120 08 19 33 46 59 70 78 85 90 93 97 99 *

    08 20 35 49 61 72 80 87 91 94 98 99 *

    3 20 05 11 20 29 39 48 57 65 72 78 87 93 96 98

    60 06 14 25 37 49 60 69 77 83 88 94 97 99 *120 06 15 27 39 51 62 72 79 85 90 95 98 99 *

    07 16 29 42 54 65 74 82 87 91 96 98 *

    4 20 04 09 16 23 32 41 49 57 64 71 81 89 93 96

    60 05 12 21 31 42 53 62 71 78 83 91 96 98 99

    120 05 13 23 34 45 56 66 74 81 86 93 97 98 *

    06 14 25 37 49 60 69 77 84 89 95 98 99 *

    5 20 03 07 13 20 27 35 43 50 58 94 76 84 90 93 *

    60 04 10 18 28 37 48 57 66 73 79 88 94 97 98

    120 04 11 20 30 41 51 61 70 77 83 91 95 98 99

    05 12 22 33 44 55 65 74 80 86 93 97 99 99

    6 20 03 06 11 17 23 30 37 45 52 58 70 79 86 91 *60 04 09 16 24 34 43 52 61 69 75 85 92 96 98

    120 04 10 17 27 37 47 57 66 73 79 89 94 97 99

  • 8/2/2019 Post Hoc Power

    16/31

    216 ONWUEGBUZIE AND LEECH

    TABLE 3 (Continued)

    dfi dfe 2 4 6 8 10 12 14 16 18 20 24 28 32 36 40

    7 20 03 05 09 15 20 26 33 40 46 53 65 74 82 88 99

    60 04 08 14 22 30 39 48 57 64 71 82 90 94 97 *

    120 04 09 16 24 34 43 53 62 69 76 86 93 96 96

    04 10 18 27 37 48 58 67 74 81 90 95 98 99

    8 20 02 05 08 13 18 23 29 36 42 48 60 70 78 84 98

    60 03 07 13 20 27 36 44 53 61 67 79 87 93 96 *

    120 03 08 14 22 31 40 49 58 66 73 84 91 95 98

    04 09 16 25 35 45 55 64 72 78 88 94 97 99

    9 20 02 04 07 11 16 21 26 32 38 44 55 65 74 81 86

    60 03 06 11 18 25 33 41 49 57 64 76 85 91 95 97

    120 03 07 13 20 29 37 46 55 63 70 81 89 94 97 98

    04 08 15 23 33 42 52 61 69 76 86 93 96 98 99

    10 20 02 04 06 10 14 19 24 29 34 40 51 61 70 77 83

    60 02 06 10 16 23 30 38 46 54 61 73 83 89 94 96

    120 02 07 12 19 27 35 44 52 60 67 79 87 93 96 98

    03 08 14 22 31 40 49 58 66 74 84 91 96 98 99

    11 20 02 04 07 10 13 17 22 26 31 37 47 57 66 74 80

    60 03 05 10 15 21 29 36 44 51 58 71 80 87 92 95

    120 03 06 11 17 25 33 41 50 58 65 77 86 92 95 97 03 07 13 20 29 38 47 56 64 71 83 90 95 97 99

    12 20 02 03 05 08 12 15 20 24 29 34 44 53 62 70 77

    60 02 05 09 14 20 26 33 41 48 55 68 78 85 91 94

    120 02 06 11 17 24 31 38 47 55 62 75 84 90 95 97

    03 07 12 19 27 36 45 54 62 69 81 89 94 97 99

    13 20 02 04 06 08 11 14 18 22 26 31 40 50 59 67 74

    60 02 05 08 13 18 25 32 39 46 52 65 76 84 89 93

    120 02 05 10 15 22 29 37 45 53 60 72 82 89 93 96

    03 06 11 18 26 35 44 52 60 68 80 88 93 96 98

    14 20 02 04 05 08 10 13 17 20 24 29 38 47 55 63 70

    60 02 05 08 12 17 23 30 36 43 50 62 73 82 88 92120 02 05 09 14 21 28 35 43 50 58 70 80 88 92 96

    03 06 11 17 25 33 42 50 59 66 78 87 92 96 98

    15 20 02 03 05 07 09 12 15 19 23 27 35 44 52 60 67

    60 02 04 07 11 16 22 28 34 41 47 60 71 80 86 91

    120 02 05 09 13 19 26 33 41 48 55 68 78 86 90 95

    02 06 10 16 24 32 40 49 57 64 77 86 92 95 98

    18 20 02 03 04 06 08 10 13 15 18 22 29 36 44 51 59

    60 02 04 06 10 14 18 23 29 35 41 53 64 74 81 87

    120 02 04 07 11 17 22 29 36 43 49 62 73 82 88 93

    02 05 09 14 21 28 36 44 52 59 72 82 89 94 96

    20 20 02 03 04 05 07 09 11 14 16 19 25 32 39 46 53

    60 02 04 06 09 12 16 21 26 32 37 49 60 70 78 84

  • 8/2/2019 Post Hoc Power

    17/31

    for interpolation,when .995.

  • 8/2/2019 Post Hoc Power

    18/31

    specttothereciprocalsofdfe, that is, 1/20,1/60, 1/120, and 0, respectively. Inpartic-

    ular,whenthepowercoefficientisneededforagivenforagivendfe,where dfe falls

    betweenLdfe and Udfe, the lower and upper valuespresented inTable 2 and Table 3,the power values PowerL and PowerUare obtained foratLdfe and Udfe by linear in-terpolation.Then,tocomputepowerforagiven dfe, thefollowingformulaisused:

    (5)

    For Udfe = , 1/Udfe = 0, such that for dfe > 120, the denominator of Equation 5 is1/120 0 = .0833 (Cohen, 1988).

    Virtually all univariate tests of statistical significance fall into one of the fol-

    lowing three cases: Case A, Case B, or Case C. Case A, the simplest case, involves

    testing a set (A) of one or more independent variables. As before, dfi is the number

    of independent variables contained in the set, and Equation 3 is used, with the ef-

    fect size being represented by

    (6)

    where the numerator represents the proportion of variance explained by the vari-

    ables in Set A and the denominator represents the error variance proportion.

    With respect to Case B, the proportion of variance in the dependent variable ac-

    counted for a Set A, over and above what is explained by another set B, is ascer-

    tained. This is represented by . The error variance proportion is then

    . Equation 3 can then be used, with the effect size being represented by

    (7)

    The error degrees of freedom for Case B is given by

    dfe = N dfi dfb 1, (8)

    where dfi is the number of variables contained in Set A, and dfb is the number of

    variables in Set B.Case C is similar to Case B, inasmuch as the test of interest is the unique contri-

    218 ONWUEGBUZIE AND LEECH

    ( )( )

    1 1

    ( ).1 1

    e eL u L

    e e

    Ldf dfPower Power Power Power

    Ldf Udf

    +

    = =

    +

    2

    2

    ,1

    Y A

    Y A

    R

    R

    2 2B , YY A BR R

    2 ,1 Y A BR

    2 2B ,

    2 ,

    .

    1

    YY A B

    Y A B

    R R

    R

  • 8/2/2019 Post Hoc Power

    19/31

    . Therefore, Equation 3 can then be used, with the effect size being repre-

    sented by

    (9)

    The error degrees of freedom for Case B is given by

    dfe = N dfi dfb dfc 1, (10)

    where dfi is the number of variables contained in Set A, dfb is the number of vari-

    ables in Set B, and dfc is the number of variables in Set C. As noted by Cohen

    (1988), Case C represents the most general case, with Cases A and B being special

    cases of Case C.

    FRAMEWORK FOR CONDUCTING POST HOC

    POWER ANALYSES

    We agree that post hoc power analyses should accompany statistically

    nonsignificant findings.5 In fact, such analyses can provide useful information for

    replicationstudies. Inparticular, the components of the post hoc power analysis can

    be used toconducta prioripower analyses insubsequent replication investigations.

    Figure 1 displays our power-based framework for conducting NHST. Spe-

    cifically, once the research purpose andhypotheses have been determined, the next

    step is touse ana prioripower analysis todesign the study.Oncedatahavebeencol-

    lected,thenextstepistotestthehypotheses.Foreachhypothesis,ifstatisticalsignif-

    icance is reached (e.g., at the 5% level), then the researcher should report the effect

    size andconfidenceinterval aroundtheeffectsize (e.g.,Bird,2002;Chandler,1957;

    Cumming&Finch,2001;Fleishman,1980;Steiger&Fouladi,1992,1997;Thomp-

    son, 2002). Conversely, if statistical significance is not reached, then the researcher

    shouldconductaposthocpoweranalysisinanattempttoruleinortoruleoutinade-

    quate power (e.g., power < .80) as a threat to the internal validity of the finding.

    HEURISTIC EXAMPLE

    Recently, Onwuegbuzie, Witcher, Filer, Collins, and Moore (2003) conducted a

    study investigating characteristics associated with teachers views on discipline.

    POST HOC POWER 219

    2 2B ,

    2 , ,

    1

    YY A B

    Y A B C

    R R

    R

    5Moreover, we recommend that upper bounds for post hoc power estimatesbe computed (Steiger &

    Fouladi 1997) This upper bound is estimated via the noncentralityparameter However this is beyond

    2 , ,Y A B CR

  • 8/2/2019 Post Hoc Power

    20/31

    The theoretical framework for this investigation, although not presented here, can

    be found by examining the original study. Although several independent variables

    were examined by Onwuegbuzie, Witcher, et al., we restrict our attention to one of

    them, namely,ethnicity (i.e.,CaucasianAmerican vs.minority)and its relationship

    to discipline styles.

    Participants were 201 students at a large mid-southern university who were ei-ther preservice (77.0%) or in-service (23.0%) teachers. The sample size was se-

    220 ONWUEGBUZIE AND LEECH

    FIGURE 1 Power-based framework for conducting null hypothesis significant tests.

    Ye s

    S t a t i s t i c a l

    s i g n i f i c a n c e

    a c h i e v e d ?

    R e p o r t a n d

    i n t e r p r e t

    p o s t - h o c p o w e r

    c o e f f i c i e n t

    P o s t - h o c

    p o w e r

    c o e f f i c i e n t

    low?

    L a c k o f p o w e r

    i s a r i v a l

    e x p l a n a t i on o f

    t h e s t a t i s t i c a l ly

    n o n - s i g n i f i c a n t

    f i n d i n g

    R e p o r t

    e f f e c t s i z e a n d

    c o n f i d e n c e

    i n t e r v a l a r o u n d

    e f f e c t s i z e

    No

    R u l e o u t p o w e r

    a s a r i v a l

    e x p l a n a t io n o f

    t h e s t a t i s t i c a l

    n o n -

    s i g n i f i c a n c e

    Ye s

    No

    I d e n t i f y

    r e s e a r c h

    p u r p o s e

    D e t e r m i n e a n d

    s t a t e

    h y p o t h e s e s

    C o n d u c t a p r i o r i

    p o w e r a n a l y s i s

    a s p a r t o f

    p l a n n i n g

    r e s e a r c h d e s i g n

    C o l l e c t

    d a t a

    C o n d u c t N u l l

    H y p o t h e s i s

    S i g n i f i c a n c e

    Test

  • 8/2/2019 Post Hoc Power

    21/31

    d= .5) at the (two-tailed) .05 level of significance, maintaining a family-wise error

    of 5% (i.e., approximately .01 for each set of statistical tests comprising the three

    subscales used; Erdfelder et al., 1996). The preservice teachers were selected fromseveral sections of an introductory-level undergraduate education class. On the

    other hand, the in-service teachers represented graduate students who were en-

    rolled in one of two sections of a research methodology course.

    On the 1st week of class, participants were administered the Beliefs on Disci-

    pline Inventory (BODI), which was developed by Roy T. Tamashiro and Carl D.

    Glickman (as cited in Wolfgang & Glickman, 1986). This measure was con-

    structed to assess teachers beliefs on classroom discipline by indicating the de-

    gree to which they are noninterventionists, interventionists, and interactionalists.

    The BODI contains 12 multiple-choice items, each with two response options. Foreach item, participants are asked to select the statement with which they most

    agree. The BODI contains three subscales representing the Noninterventionist, In-

    terventionist, and Interactionalist orientations, with scores on each subscale rang-

    ing from 0 to 8. A high score on any of these scales represents a teachers proclivity

    toward the particular discipline approach. For our study, the Noninterventionist,

    Interventionist, and Interactionalist subscales generated scores that had a classical

    theory alpha reliability coefficient of .72 (95% confidence interval [CI] = .66, .77),

    .75 (95% CI = .69, .80), and .94 (95% CI = .93, .95), respectively.

    A series of independent ttests, using the Bonferroni adjustment to maintain afamily-wise error of 5%, revealed no statistically significant difference between

    Caucasian American (n = 175) and minority participants (n = 26) for scores on the

    Interventionist, t(199) = 1.47, p > .05; Noninterventionist, t(199) = 0.88, p > .05;

    and Interactionalist, t(199) = 0.52, p > .05 subscales. After finding statistical

    nonsignificance, Onwuegbuzie, Witcher, et al. (2003) could have concluded that

    there were no ethnic differences in discipline beliefs. However, Onwuegbizie,

    Witcher, et al. decided to conduct a post hoc power analysis. The post hoc power

    analysis for this test of ethnic differences revealed low statistical power. Thus,

    Onwuegbuzie, Witcher, et al. (2003) concluded the following:

    The finding of no ethnic differences in discipline beliefs also is not congruent with

    Witcher et al. (2001), who reported that minority preservice teachers less often en-

    dorsedclassroomandbehavior management skillsas characteristic ofeffectiveteach-

    ers than did Caucasian-American preservice teachers. Again, the non-significance

    could have stemmed from the relatively small proportion of minority students (i.e.,

    12.9%), which induced relatively low statistical power (i.e., 0.59) for detecting a

    moderate effect size for comparing the two groups with respect to three outcomes

    (Erdfelder et al., 1996). More specifically, using the actual observed effect sizes per-

    taining to these three differences, and applying the Bonferroni adjustment, the

    post-hoc statistical power estimates were .12, .06, and .03 for interventionist,

    POST HOC POWER 221

  • 8/2/2019 Post Hoc Power

    22/31

    Replications are thus needed to determine the reliability of the present findings of no

    ethnic differences in discipline belief. (p. 19)

    Although Onwuegbuzie, Witcher, et al. (2003) computed their power coeffi-

    cients using a statistical power program (i.e., Erdfelder et al., 1996), these esti-

    mates could have been calculated by hand using Equation 3 and Tables 2 and 3.

    For example, to determine the power of the test of racial differences in levels of in-

    terventionist orientation, Table 3 would be used (i.e., = .01) as a result of theBonferroni adjustment used (i.e., .05/3). The proportion of variance in interven-

    tionist orientation accounted for by race (i.e., effect size) can be obtained by using

    one of the following three transformation formulae:

    (11)

    (12)

    (13)

    where t2 is square of the tvalue, Fis square of the Fvalue, d2 is the square of Co-

    hens dstatistic, and dfi and dfe are the numerator and denominator degrees of free-

    dom, respectively (Murphy & Myors, 1998). Using the tvalue of 1.47 reported

    previously by Onwuegbuzie,Witcher, et al. pertaining to the racial difference in in-

    terventionist orientation, the associated numerator degrees of freedom (dfi) of 1

    (i.e., one independent variable: race) and the corresponding denominator degrees

    of freedom (dfe) of 199 (i.e., total sample size 2) from Equation 11, we obtain

    (14)

    Next, we substitute this value ofES and the numerator and denominator degrees of

    freedom into Equation 3 to yield a noncentrality parameter of= .0107 (1 + 199 +1) = 2.15 2.FromTable 3,weuse dfi =1and = 2. Now, dfe = 199 isbetweenLdfe= 120 and Udfe = . We could use Equation 5, however, because if both the lower

    and upper power values are .12,our estimated power value will be .12,which repre-sents extremely low statistical power for detecting the smallobserved effect size of

    222 ONWUEGBUZIE AND LEECH

    2

    2,

    ( )e

    tES

    t df=

    +

    ( ),

    ( )

    i

    i e

    df FES

    df F df =

    +

    2

    2 ;4

    dES d= +

    2

    2

    (11.47).0107.

    (1.47) 199ES = =

    +

  • 8/2/2019 Post Hoc Power

    23/31

    If Onwuegbuzie, Witcher, et al. (2003) had stopped after finding statistically

    nonsignificant results, this would have been yet another instance of the deepening

    file-drawer problem (Rosenthal, 1979). The post hoc power analysis allowed thestatistically nonsignificant finding pertaining to ethnicity to be placed in a more

    appropriate context. This in turn helped the researchers to understand that their sta-

    tistically nonsignificant finding was not necessarily due to an absence of a rela-

    tionship between ethnicity and discipline style in the population but was due at

    least in part to a lack of statistical power.

    By conducting a post hoc analysis, Onwuegbuzie, Witcher, et al. (2003) real-

    ized that the conclusion that no ethnic differences prevailed in discipline beliefs

    likely would have resulted in a Type II error being committed. Similarly, if

    Onwuegbuzie, Witcher, et al. had bypassed conducting a statistical significancetest and had merely computed and interpreted the associated effect size, they

    would have increased the probability of committing a Type A error. Either way,

    conclusion validity would have been threatened.

    As noted by Onwuegbuzie and Levin (2003) and Onwuegbuzie, Levin, et al.

    (2003), the best way to increase conclusion validity is through independent (exter-

    nal) replications of results (i.e., two or more independent studies yielding similar

    findings that produce statistically and practically compatible outcomes). Indeed,

    independent replications make invaluable contributions to the cumulative knowl-

    edge in a given domain (Robinson & Levin, 1997, p. 25). Thus, post hoc analysescan play a very important role in promoting as well as improving the quality of ex-

    ternal replications. The post hoc analysis documented previously, by revealing

    low statistical power first and foremost, strengthens the rationale for conducting

    independent replications. Moreover, researchers interested in performing inde-

    pendent replications could use the information from Onwuegbuzie, Witcher, et

    al.s (2003) post hoc analysis for their subsequent a priori power analyses. Indeed,

    as part of their post hoc power analysis, Onwuegbuzie, Witcher, et al. could have

    conducted what we call a what-if post hoc power analysis to determine how many

    more cases would have been needed to increase the power coefficient to a more ac-ceptable level (e.g., .80). As such, what-if post hoc power analyses provide useful

    information for future researchers.

    Even novice researchers who are unable or unwilling to conduct a priori power

    analyses can use the post hoc power values to warn them to employ larger samples

    (i.e., n > 201) in independent replication studies than that used in Onwuegbuzie,

    Witcher, et al.s (2003) investigation. Moreover, novice researchers can use infor-

    mation from what-if post hoc power analyses to set their minimum group/sample

    sizes. Although we strongly encourage these researchers to perform a priori power

    analyses, we believe that what-if post hoc power analyses place the novice re-searcher in a better position to select appropriate sample sizesones that reduce

    POST HOC POWER 223

  • 8/2/2019 Post Hoc Power

    24/31

    around what-if post hoc power estimates using the (95%) upper and lower confi-

    dence limits around the observed effect size (cf. Bird, 2002). For instance, for the

    two-sample t-test context, Hedges and Olkins (1985) z-based 95% CI6 could beused to determine the upper and lower limits that subsequently would be used to

    derive a confidence interval around the estimated minimum sample size needed to

    obtain adequate power in an independent replication.

    Post hoc power analyses also could be used for meta-analytic purposes. In par-

    ticular, if researchers routinely were to conduct post hoc power analyses,

    meta-analysts could determine the average observed power across studies for any

    hypothesis of interest. Knowledge of observed power across studies in turn would

    help researchers interpret the extent to which studies in a particular area were truly

    contributing to the knowledge base. Low average observed power across inde-pendent replications would necessitate research design modifications in future ex-

    ternal replications. Additionally, what we call weighted meta-analyses could be

    conducted wherein effect sizes across studies are weighted by their respective post

    hoc power estimates. For example, if an analysis revealed a post hoc power esti-

    mate of .80, the effect-size estimate of interest would be multiplied by this value to

    yield a power-adjusted, effect-size coefficient. Similarly, a post hoc power esti-

    mate of .60 would adjust the observed effect-size coefficient by this amount. By

    using this technique when conducting meta-analyses, effect sizes resulting from

    studies with adequate power would be given more weight than effect sizes derivedfrom investigations with low power.

    Inaddition,ifaresearcherconductsbothanaprioriandaposthocpoweranalysis

    within the same study, then a power discrepancy index (PDI) could be computed,

    whichrepresentsthedifferencebetweentheaprioriandposthocpowercoefficients.

    ThePDIwouldthenprovideinformationtoresearchersforincreasingthesensitivity

    (i.e., the degree to whichsampling error introduce imprecision into the results of a

    study; Murphy & Myors, 1998, p. 5) of future a priori power analyses. Sensitivity

    mightbe improved by increasing thesamplesize in independent replication studies.

    Alternatively,sensitivity might be increased byusing measures thatyield more reli-able and valid scores or a study design that allows them to control for undesirable

    sources of variability in their data (Murphy & Myors, 1998).

    We have outlined several ways in which post hoc power analyses can be used to

    improve the quality of present and future studies. However, this list is by no means

    exhaustive. Indeed, we are convinced that other uses of post hoc power analyses

    are awaiting to be uncovered. Regardless of the way in which a post hoc power

    analysis is utilized, it is clear that this approach represents good detective work be-

    cause it enables researchers to get to know their data better and consequently puts

    224 ONWUEGBUZIE AND LEECH

    6According to Hess and Kromrey (2002) Hedges and Olkinss (1985) z-based confidence interval

  • 8/2/2019 Post Hoc Power

    25/31

    them in a position to interpret their data in a more meaningful way and with more

    conclusion validity. Therefore, we believe that post hoc power analyses should

    play a central role in NHST.

    SUMMARY AND CONCLUSIONS

    Robinson andLevin (1997) proposed a two-step procedure for analyzing empirical

    data whereby researchers first evaluate the probability of an observed effect (i.e.,

    statistical significance), and if andonly if statistical significance is found, then they

    assess the effect size. Recently, Onwuegbuzie and Levin (2003) proposed a

    three-step procedure when two or more hypothesis tests are conducted within the

    samestudy,whichinvolvestestingthetrendofthesetofhypothesesatthethirdstep.Although both methods are appealing, their effectiveness depends on the statistical

    power of the underlying hypothesis tests. Specifically, if power is lacking, then the

    first step of the two-step method and the first and third steps of the three-step proce-

    dure, which serve as gatekeepers for computing effect sizes, may lead to the

    nonreporting of a nontrivial effect (i.e., Type A error; Onwuegbuzie, 2001).

    Because the typical level of power for medium effect sizes in the behavioral and

    social sciences is around .50 (Cohen, 1962), the incidence of Type A error likely is

    high. Clearly, this incidence can be reduced if researchers conduct an a priori

    power analysis to select appropriate sample sizes. However, such analyses arerarely employed (Cohen, 1992). Regardless, when a statistically nonsignificant

    finding emerges, researchers should then conduct a post hoc power analysis. This

    would help researchers determine whether low power threatens the internal valid-

    ity of their findings (i.e., Type A error). Moreover, if researchers conduct post hoc

    power analyses whether statistical significance is reached, then meta-analysts

    would be able to assess whether independent replications are making significant

    contributions to the accumulation of knowledge in a given domain. Unfortunately,

    virtually no meta-analyst has formally used this technique.

    Thus, in this article, weadvocate the use of posthoc power analyses in empiricalstudies, especially when statistically nonsignificant findings prevail. First, reasons

    for the nonuse of a priori power analyses were presented. Second, post hoc power

    wasdefinedanditsutilitydelineated.Third,astep-by-stepguideisprovidedforcon-

    ductingpost hoc power analyses. Fourth, a heuristic example was provided to illus-

    trate how post hoc power can help to rule in/out rival explanations to observed

    findings. Finally, several methods were outlined that describe how post hoc power

    analyses can be used to improve the design of independent replications.

    Although we advocate the use of post hoc power analyses, we believe that such

    approaches should never be used as a substitute for a priori power analyses. More-over, we recommend that a priori power analyses always be conducted and re-

    POST HOC POWER 225

  • 8/2/2019 Post Hoc Power

    26/31

    more statistically nonsignificant findings emerge. Post hoc power analyses rely

    more on available data and less on speculation than do a priori power analyses that

    are based on hypothesized effect sizes.Indeed, we agree with Wooley and Dawson (1983), who suggested editorial

    policies to require all such information relating to a priori design considerations

    and post hoc interpretation to be incorporated as a standard component of any re-

    search report submitted for publication (p. 680). Although it could be argued that

    this recommendation is ambitious, it is no more ambitious than the editorial poli-

    cies at 20 journals that now formally stipulate that effect sizes be reported for all

    statistically significant findings (Capraro & Capraro, 2002). In fact, post hoc

    power provides a nice balance in report writing because we believe that post hoc

    power coefficients are to statistically nonsignificant findings as effect sizes are tostatistically significant findings. In any case, we believe that such a policy of con-

    ducting and reporting a priori and post hoc power analyses would simultaneously

    reduce the incidence of Type II and Type A errors and subsequently reduce the in-

    cidence of publication bias and the file-drawer problem. This can only help to in-

    crease the accumulation of knowledge across studies because meta-analysts will

    have much more information with which to use. This surely would represent a step

    in the right direction.

    REFERENCES

    American Psychological Association. (2001). Publication manualof theAmerican Psychological Asso-

    ciation (5th ed.). Washington, DC: Author.

    Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 6,

    432437.

    Bird, K. D. (2002). Confidence intervals for effect sizes in analysis of variance. Educational and Psy-

    chological Measurement, 62, 197226.

    Brewer, J. K. (1972). On the power of statistical tests in the American Education Research Journal.

    American Education Research Journal, 9, 391401.

    Brewer, J. K., & Owen, P. W. (1973). A note on the power of statistical tests in the Journal of Educa-

    tional Measurement. Journal of Educational Measurement, 10, 7174.Cahan, S. (2000). Statistical significance is not a kosher certificate for observed effects: A critical

    analysis of the two-step approach to the evaluation of empirical results. Educational Researcher,

    29(1), 3134.

    Capraro, R. M., & Capraro, M. M. (2002). Treatments of effect sizes and statistical significance tests in

    textbooks. Educational and Psychological Measurement, 62, 771782.

    Carver, R. P. (1978). The case against statistical significance testing.Harvard Educational Review, 48,

    378399.

    Carver, R. P. (1993). The case against statistical significance testing, revisited.Journal of Experimental

    Education, 61, 287292.

    Chandler, R. E. (1957). The statistical concepts of confidence and significance. Psychological Bulletin,

    54, 429430.

    Chase, L. J., & Baran, S. J. (1976). An assessment of quantitative research in mass communication.

    226 ONWUEGBUZIE AND LEECH

  • 8/2/2019 Post Hoc Power

    27/31

    Chase,L. J.,& Chase,R. B. (1976). A statistical power analysis of appliedpsychological research.Jour-

    nal of Applied Psychology, 61, 234237.

    Chase, L. J., & Tucker, R. K. (1975). A power-analytical examination of contemporary communication

    research. Speech Monographs, 42(1), 2941.

    Chase, L. J., & Tucker, R. K. (1976). Statistical power: Derivation, development, and data-analytic im-

    plications. The Psychological Record, 26, 473486.

    Cohen, J. (1962). The statistical power of abnormal social psychological research: A review.Journal of

    Abnormal and Social Psychology, 65, 145153.

    Cohen,J. (1965). Some statistical issues in psychologicalresearch. In B.B. Wolman(Ed.),Handbook of

    clinical psychology (pp. 95121). New York: McGraw-Hill.

    Cohen, J. (1968). Multiple regression as a general data-analytic system. Psychological Bulletin, 70,

    426443.

    Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York: Academic.

    Cohen, J. (1973). Statistical power analysis andresearch results.American Educational Research Jour-nal, 10, 225230.

    Cohen,J. (1977). Statisticalpoweranalysis forthebehavioral sciences (Rev.ed.).NewYork:Academic.

    Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Law-

    rence Erlbaum Associates, Inc.

    Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155159.

    Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 9971003.

    Cohen, J. (1997). The earth is round (p < .05). InL.L. Harlow, S.A. Mulaik, & J.H. Steiger (Eds.), What

    if therewere nosignificancetests?(pp. 2135). Mahwah, NJ:Lawrence Erlbaum Associates, Inc.

    Cooper, H. M., & Findley, M. (1982). Expected effect sizes: Estimates for statistical power analysis in

    social psychology. Personality and Social Psychology Bulletin, 8, 168173.

    Cumming, G., & Finch, S. (2001). A primer on the understanding, use and calculation of confidence in-tervals that are based on central and noncentral distributions. Educational and Psychological Mea-

    surement, 61, 532575.

    Daly, J., & Hexamer, A. (1983). Statistical power in research in English education. Research in the

    Teaching of English, 17, 157164.

    Dayton, C.M.,Schafer,W.D.,& Rogers, B.G. (1973). Onappropriate uses andinterpretationsofpower

    analysis: A comment. American Educational Research Journal, 10, 231234.

    Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis program.Behavior

    Research Methods, Instruments, & Computers, 28, 111.

    Fagely, N. S. (1985). Applied statistical power analysis and the interpretation of nonsignificant results

    by research consumers. Journal of Counseling Psychology, 32, 391396.

    Fagley, N. S.,& McKinney, I. J. (1983). Reviewer bias for statistically significant results: A reexamina-

    tion. Journal of Counseling Psychology, 30, 298300.

    Fidler, F. (2002). The fifth edition of theAPA Publication Manual: Why its statistics recommendations

    are so controversial. Educational and Psychological Measurement, 62, 749770.

    Fisher, R. A. (1941). Statistical methods for research workers (84th ed.) Edinburgh, Scotland: Oliver &

    Boyd. (Original work published 1925)

    Fleishman, A. I. (1980). Confidence intervals for correlation ratios. Educational and Psychological

    Measurement, 40, 659670.

    Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulle-

    tin, 82, 120.

    Guttman, L. B. (1985). The illogic of statistical inference for cumulative science. Applied Stochastic

    Models and Data Analysis, 1, 310.

    Haase, R. F. (1974). Power analysis of research in counselor education. Counselor Education and Su-

    POST HOC POWER 227

  • 8/2/2019 Post Hoc Power

    28/31

    Haase, R.F., Waechter,D.M.,& Solomon, G.S.(1982). How significant isa significant difference?Aver-

    ageeffectsize ofresearch incounseling psychology.Journal of CounselingPsychology,29,5865.

    Halpin, R., & Easterday, K. E. (1999). The importance of statistical power for quantitative research: Apost-hoc review. Louisiana Educational Research Journal, 24, 528.

    Harris, R. J. (1997). Reforming significance testing via three-valued logic. In L. L. Harlow, S. A.

    Mulaik,& J.H. Steiger (Eds.), What if therewere nosignificance tests?(pp. 145174).Mahwah, NJ:

    Lawrence Erlbaum Associates, Inc.

    Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. New York: Academic.

    Hess, M. R., & Kromrey, J. D. (2002, April). Interval estimates of effect size. An empirical comparison

    of methods for constructing confidence bands around standardized mean differences. Paper pre-

    sentedat theannualmeeting of theAmericanEducational ResearchAssociation,NewOrleans,LA.

    Katzer, J., & Sodt, J. (1973). An analysis of the use of statistical testing in communication research. The

    Journal of Communication, 23, 251265.

    Keselman, H. J., Huberty, C. J., Lix, L. M., Olejnik, S., Cribbie, R. A., Donahue, B., et al. (1998). Statis-tical practices of educational researchers:An analysis of their ANOVA, MANOVA, andANCOVA

    analyses. Review of Educational Research, 68, 350386.

    Kroll, R. M., & Chase, L. J. (1975). Communication disorders: A power analytical assessment of recent

    research. Journal of Communication Disorders, 8, 237247.

    Levenson,R. L. (1980). Statistical power analysis: Implications for researchers,planners,andpractitio-

    ners in gerontology. Gerontologist, 20, 494498.

    Levin, J. R., & Robinson, D. H. (2000). Statistical hypothesis testing, effect-size estimation, and the

    conclusion coherence of primary research studies. Educational Researcher, 29(1), 3436.

    Loftus, G. R. (1996). Psychology will be a much better science when we change the way we analyze

    data. Current Directions in Psychology, 5, 161171.

    McNemar, Q. (1960). At random: Sense and nonsense. American Psychologist, 15, 295300.

    Meehl, P. E. (1967). Theory testing in psychology and physics: A methodological paradox. Philosophy

    of Science, 34, 103115.

    Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress

    of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806834.

    Morse, D. T. (2001, November). YAPP: Yet another power program.

    Paper presented at the annual meeting of the Mid-South Educational Research Association, Chatta-

    nooga, TN.

    Mulaik, S.A., Raju, N.S., & Harshman, R.A. (1997). There is a timeand a place for significance testing.

    In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp.

    65115). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

    Mundfrom, D. J., Shaw, D. G., Thomas, A., Young, S., & Moore, A. D. (1998, April). Introductory

    graduate research courses: An examination of the knowledge base. Paper presented at the annual

    meeting of the American Educational Research Association, San Diego, CA.

    Murphy, K. R., & Myors, B. (1998). Statistical power analysis: A simple and general model for tradi-

    tional and modern hypothesis tests. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

    Neyman, J., & Pearson,E.S. (1928). On the use and interpretation ofcertain test criteria for the purposes

    of statistical inference. Biometrika, 20A, 175200, 263294.

    Neyman, J.,& Pearson, E.S. (1933a).On theproblem of themost efficient tests of statistical hypotheses.

    Transactions of the Royal Society of London, Series A, 231, 289337.

    Neyman, J., & Pearson, E. S. (1933b). Testing statistical hypotheses in relation to probabilities a priori.

    Proceedings of the Cambridge Philosophical Society, 29, 492510.Nix, T. W., & Barnette, J. J. (1998). The data analysis dilemma: Ban or abandon. A review of null hy-

    pothesis significance testing Research in the Schools, 5, 314

    228 ONWUEGBUZIE AND LEECH

  • 8/2/2019 Post Hoc Power

    29/31

    Onwuegbuzie, A. J. (2002). Common analytical and interpretational errors in educational research: An

    analysis of the 1998 volume of the British Journal of Educational Psychology. Educational Re-

    search Quarterly, 26, 1122.

    Onwuegbuzie, A. J. (2003). Expanding the framework of internal and external validity in quantitative

    research. Research in the Schools, 10, 7190.

    Onwuegbuzie,A. J.,& Daniel,L. G. (2002). A frameworkforreportingandinterpreting internal consis-

    tency reliability estimates. Measurement and Evaluation in Counseling and Development, 35,

    89103.

    Onwuegbuzie,A. J.,& Daniel,L. G. (2003,February12). Typologyofanalytical andinterpretational er-

    rors in quantitative andqualitative educational research.Current Issues in Education, 6(2) [Online].

    Retrieved October 10, 2004, from http://cie.ed.asu.edu/volume6/number2/

    Onwuegbuzie, A. J., & Daniel, L. G. (2004). Reliability generalization: The importance of considering

    sample specificity, confidence intervals, and subgroup differences. Research in the Schools, 11(1),

    6172.Onwuegbuzie, A. J., & Levin, J. R. (2003). A new proposed three-step method for testing result direc-

    tion. Manuscript in preparation.

    Onwuegbuzie, A. J., & Levin, J. R. (2003). Without supporting statistical evidence, where would re-

    portedmeasures of substantive importance lead? To no good effect.Journal of Modern Applied Sta-

    tistical Methods, 2, 133151.

    Onwuegbuzie, A. J., Levin, J. R., & Leech, N. L. (2003). Do effect-size measures measure up? A brief

    assessment. Learning Disabilities: A Contemporary Journal, 1, 3740.

    Onwuegbuzie, A. J., Witcher, A. E., Filer, J., Collins, K. M. T., & Moore, J. (2003). Factors associated

    with teachers beliefs about discipline in the context of practice. Research in the Schools, 10(2),

    3544.

    Orme, J. G., & Tolman, R. M. (1986). The statistical power of a decade of social work education re-search. Social Science Review, 60, 619632.

    Overall, J. E. (1969). Classical statistical hypotheses within the context of Bayesian theory. Psychologi-

    cal Bulletin, 71, 285292.

    Penick, J. E., & Brewer, J. K. (1972). The powerof statistical tests in science teaching research.Journal

    of Research in Science Teaching, 9, 377381.

    Robinson, D. H., & Levin, J. R. (1997). Reflections on statistical and substantive significance, with a

    slice of replication. Educational Researcher, 26(5), 2126.

    Rosenthal, R. (1979). The file-drawer problem and tolerance for null results. Psychological Bulletin,

    86, 638641.

    Rossi, J. S. (1990). Statistical powerof psychological research: What have we gained in 20 years?Jour-

    nal of Consulting and Clinical Psychology, 58, 646656.

    Rossi, J. S. (1997). A case study in the failure of psychology as a cumulative science: The spontaneous

    recovery of verbal learning. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were

    no significance tests? (pp. 175197). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

    Rozeboom, W. W. (1960). The fallacy of the null hypothesis significance test. Psychological Bulletin,

    57, 416428.

    SASInstitute, Inc. (2002). SAS/STAT UsersGuide (Version 8.2) [Computer software]. Cary, NC:Au-

    thor.

    Sawyer, A. G., & Ball, A. D. (1981). Statistical power and effect size in marketing research. Journal of

    Marketing Research, 18, 275290.

    Schmidt, F. L. (1992). What do data really mean? American Psychologist, 47, 11731181.Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Impli-

    cations for the training of researchers Psychological Methods, 1, 115129

    POST HOC POWER 229

  • 8/2/2019 Post Hoc Power

    30/31

    (Eds.), What if there were no significance tests? (pp. 3764). Mahwah, NJ: Lawrence Erlbaum As-

    sociates, Inc.

    Schmidt, F. L., Hunter, J. E., & Urry, V. E. (1976). Statistical power in criterion-related validation stud-

    ies. Journal of Applied Psychology, 61, 473485.

    Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of

    studies? Psychological Bulletin, 105, 309316.

    Sherron,R.H. (1988). Power analysis: The other halfof the coin. Community/JuniorCollege Quarterly,

    12, 169175.

    SPSS, Inc. (2001). SPSS 11.0 for Windows [Computer software]. Chicago: Author.

    Steiger, J. H., & Fouladi, R. T. (1992). R2: A computer program for interval estimation, power calcula-

    tion, and hypothesis testing for the squared multiple correlation. Behavior Research Methods, In-

    struments, and Computers, 4, 581582.

    Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality interval estimation and theevaluationof statistical

    models. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significancetests? (pp. 221257). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

    Thompson, B. (2002). What future quantitative social science research could look like: Confidence in-

    tervals for effect sizes. Educational Researcher, 31(3), 2532.

    Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76,

    105110.

    Witcher, A. E., Onwuegbuzie, A. J., & Minor, L. C. (2001). Characteristics of effective teachers: Per-

    ceptions of preservice teachers. Research in the Schools, 8(2), 4557.

    Wilkinson, L., & The Task Force on Statistical Inference. (1999). Statistical methods in psychology

    journals: Guidelines and explanations. American Psychologist, 54, 594604.

    Witcher, A. E., Onwuegbuzie, A. J., & Minor, L. (2001). Characteristics of effective teachers: Percep-

    tions of preservice teachers. Research in the Schools, 8, 4557.Wolfgang, D. H., & Glickman, C. D. (1986). Solving discipline problems: Strategies for classroom

    teachers (2nd ed.). Boston: Allyn & Bacon.

    Wooley, T. W., & Dawson, G. O. (1983). A follow-up power analysis of the statistical tests used in the

    Journal of Research in Science Teaching. Journal of Research in Science Teaching, 20, 673681.

    230 ONWUEGBUZIE AND LEECH

  • 8/2/2019 Post Hoc Power

    31/31