18
. 4/6 CHAPTER 13 , Describing Data linear regression bivariare linear regression least-squares regression line regression weight standard error of estimate coefficient of nonderermination correlation matrix , . . .\ ,... ,,,... . - Using InferGritial Statistics hapter 13 reviewed descriptive statistics that help you charac- rerize and describe your data. However, they do ITot help you assess the reliability of your findings. A reliable finding is repeatable whereas ai\ unreliable one may riot be. Statistics that assess the Tell- ability of your findings are called inferential statistics because they let you infer the characteristics of a popularioi\ from the characteris- TICS of the samples comprising your data This chapter reviews The most widely used inferentialstatistics. Rather Than focusing on how to calculate these statistics, This discus- sionfocuses on issues of application and interpretation. Consequently, computational formulas or worked examples are not presented INFERENTIAL STATISTICS: BASICCONCEPTS Before exploring some of the Inore popular inferential statistics, we resentsomeofthebasicconceptsunderlyingthesestatisrics. You should understand these concepts before tackling the discussion on inferGritial statistics that follows. If you I\eed a more comprehensive refresher on these concepts, consult a good introductory statistics text Sampling Distribution Chapter 13 Introduced the notion of a distribution of scores. Such a distribution results from collecting data across a series of observations and then plotting the frequency of each score or range of scores. It is also possible to create a distribution by repeatedly taking samples of a given size (e. g. , n = 10 scores) from the population. The means of These samples could be used to form a distribution of sample means. If you could take eveTy possible sample of n scores from the popula- lion, you would have what is known as The sampling distTibtttion of the mean. Statistical theory reveals that this distribution will tend to closely approximate the normal distribution, even when the popula- tion of scores from which the samples were drawn is farfrom normal ..,.,...,.:,..:;*-:.;. f{.*.\,**\ I a a .. . ..,,, .,.,. .. ~ ... .... . . _ . . ..._. ....-...--~-. ~ ,.~"-~.,,..",,'.*,.,"L"~' ,,~~,~"., -.~ "'~' "' ~ '~ CHAPTER OUT Inferential Statistics: Basic ^ Sampling Distribution Sampling Error Degrees OFFreedom Parametric Versus Nonpai Statistics The Logic Behind Inferentia Statistics Statistical Errors Statistical Significance One-Tailed Versus Two-Te Parametric Statistics Assumptions Underlying ; Parametric Statistic Inferential Statistics With Samples The tTest An Example From the Lit, Contrasting Two Groups The zTest for the Differei Between Two Proportion Beyond Two Grou s: Ane variance(ANOVA! ' E , ~ , ..~.~-~-~.. .-...~ ".. .. The One-Factor Between ANOVA The One-FactorWithin-S ANOVA The Two-Factor Between ANOVA The Two-FactorWithin-S ANOVA Mixed Designs Higher-order and Specia ANOVAs Noriparametric Statistics Chi-Square The Mann-Whimey U T^ The Wilcoxon Signed Rai Parametric Versus Norip; Statistics Special Topics in Inferenti Statistics Power of a Statistical Te Statistical Versus Practic Significance The Meaning of the LevE Significance DataTransformations Alternatives to Inferenti, Summary Review Questions KeyTerms 41

Bordens and Abbott 2008

Embed Size (px)

Citation preview

Page 1: Bordens and Abbott 2008

.4/6 CHAPTER 13 , Describing Data

linear regression

bivariare linear regression

least-squares regression line

regression weight

standard error of estimate

coefficient of nonderermination

correlation matrix

,

.

.

.\,...

,,,... . -

Using InferGritialStatistics

hapter 13 reviewed descriptive statistics that help you charac-rerize and describe your data. However, they do ITot help you

assess the reliability of your findings. A reliable finding is repeatablewhereas ai\ unreliable one may riot be. Statistics that assess the Tell-ability of your findings are called inferential statistics because theylet you infer the characteristics of a popularioi\ from the characteris-TICS of the samples comprising your data

This chapter reviews The most widely used inferentialstatistics.Rather Than focusing on how to calculate these statistics, This discus-sionfocuses on issues of application and interpretation. Consequently,computational formulas or worked examples are not presented

INFERENTIAL STATISTICS: BASICCONCEPTS

Before exploring some of the Inore popular inferential statistics, weresentsomeofthebasicconceptsunderlyingthesestatisrics. You should

understand these concepts before tackling the discussion on inferGritialstatistics that follows. If you I\eed a more comprehensive refresher onthese concepts, consult a good introductory statistics text

Sampling Distribution

Chapter 13 Introduced the notion of a distribution of scores. Such adistribution results from collecting data across a series of observationsand then plotting the frequency of each score or range of scores. It isalso possible to create a distribution by repeatedly taking samples ofa given size (e. g. , n = 10 scores) from the population. The means ofThese samples could be used to form a distribution of sample means.If you could take eveTy possible sample of n scores from the popula-lion, you would have what is known as The sampling distTibtttion ofthe mean. Statistical theory reveals that this distribution will tend toclosely approximate the normal distribution, even when the popula-tion of scores from which the samples were drawn is farfrom normal

..,.,...,.:,..:;*-:.;. f{.*.\,**\ I a

a

.. . ..,,, .,.,. ..~ ... ....

. . _ . . ..._. ....-...--~-. ~ ,.~"-~.,,..",,'.*,.,"L"~' ,,~~,~"., -.~ "'~' "' ~ '~

CHAPTER OUT

Inferential Statistics: Basic ^Sampling DistributionSampling ErrorDegrees OFFreedomParametric Versus NonpaiStatistics

The Logic Behind InferentiaStatistics

Statistical Errors

Statistical SignificanceOne-Tailed Versus Two-Te

Parametric Statistics

Assumptions Underlying ;Parametric StatisticInferential Statistics WithSamplesThe tTest

An Example From the Lit,Contrasting Two GroupsThe zTest for the DiffereiBetween Two Proportion

Beyond Two Grou s: Anevariance(ANOVA! ' E

,

~ , ..~.~-~-~.. .-...~

"..

..

The One-Factor BetweenANOVA

The One-FactorWithin-SANOVA

The Two-Factor BetweenANOVA

The Two-FactorWithin-SANOVA

Mixed DesignsHigher-order and SpeciaANOVAs

Noriparametric StatisticsChi-SquareThe Mann-Whimey U T^The Wilcoxon Signed RaiParametric Versus Norip;Statistics

Special Topics in InferentiStatistics

Power of a Statistical TeStatistical Versus PracticSignificanceThe Meaning of the LevESignificanceDataTransformationsAlternatives to Inferenti,

SummaryReview QuestionsKeyTerms 41

Page 2: Bordens and Abbott 2008

4/8 CHAPTER 14

in shape. Thtis, yoti calT LISe the Itormal distribution as a theoretical Inodel That willallow you to make Inferences aboutthe likely value of the populatioi\ mean, given themean of a single sample froin that population.

TITe sample Ineai\ is ITot the only statistic for which yoLi can ohmiiT a samplindistribution. In fact, eacl\ sample statistic has its own Theoretical salnpling disrribu-lion. For exaiiTple, the tabled values for the z statistic, Sttident's t, the F ratio and chi,square represent the sampling distributions of those statistics. Using these samplindistributions, yoti cal\ determine the probability that a value of a statistic as Iar e asor larger thai\ the obtained value could have occurred by chance. This probabilit iscalled the obtained p.

Sampling Error

When yoti draw a sample from a populatioiT of scores, the Ineai\ of the saln Ie, M,will probably differ froin the popularioiT mean, F1. A1T estimate of the amount of vari-ability in the expected sample means across a series of such samples is provided by thestandard error of the mean (orst"richrd erroT for short). It Inay be calculated froin thestandard deviation of the sample as follows:

S, '-

where s Is the standard deviatioi\ of the sample and n is the ITtimher of scores in thesample. The standard error Is used to estimate The standard deviation of the saln Iindistriburion of the Inean f<tithe population froin which the sample was drawn.

Degrees of Freedom

In any distribution of scores with a known mean, a limited ITtimber of data points ieldindependentinf(, rination. For example, ifyoLihave a sample of 10 scores and a knownmean (e. g. , 6.5), only 9 scores are free to vary. That is, once yon have selected 9 scoresfrom The population, the value of The 10/1\ must have a particular value that will ieldthe Inean. Thus, the degrees of freedom (df) for a single sample are n - I (where n isthe total ITUmber of scores in the sample).

Degrees of freedom come into play whei\ yoLi use any inferentialsratisric. Yotican extend this logic to the analysis of ai\ experiment. If yoti have three groups inYour experiment with Ineans ofZ, 5, and 10, the grand liteai\ (The SUIn of allthe scoresdivided by n) is theIT 5.7. IfyoLi know The grand InGal\ and yoLi know the means fromtwo of yotir groups, the final meal\ is set. Hence, the degrees of freedom for a threegrotip experiment are k - I (where k is the ITUmber of levels of The independent vari-able). The degrees of freedom are theIT LISed to find the appropriate tabled value of astatistic against which the computed value is compared

Parametric Versus Nonparametric Statistics

Inferentialstatistics can be classified as either parametric or nonpttrametTtc. A pornmeterin this context is a characteristic of a population, \\, hereas a skinstic is a characteris-tic of your sample (Gravetter 6< Wallnau, 2007). A parametric statistic estimates thevalue of a popularioi\ parameter from The characteristics of a sample. When yoLitise a

Using Inferential Statistic

parametric statistic, you are making certain assumptions about the population fromwhich your sample was drawn. A key assumption of a parametric test is that Yoursample was drawn from a normally distributed population

In contrast to a parametric statistic, a nonparametric statistic makes no assumptionsabout the distribution of scores underlying your sample. Noriparametric statistics areused tryour data do not meet the assumptions of a parametric rest

THELOGICBEHINDINFERENTIALSTATISTICS

Whenever you conduct an experiment, you expose SIIblects to differentlevels of yourindependent variable. Although a given experiment may contain several groupsassume for the present discussion that the experiment In question includes only twoThe data from each group can be viewed as a sample of The scores obtained if allsubjectsin the target popularioi\ were tested under the conditions to which the group wasexposed. For example, the treatinent group mean represents a population of subjectsexposed to your experimental treatment. Each treatment Inean is assumed to representthe mean of The underlying population

In all respects except for treatment, the treatment and control groups were exposed to equivalent conditions. Assume that the treatment hadno effect on the scores.In that case, each group'sscores could be viewed as an independentsample taken fromthe same population. Figure 14-1 illustrates this situation

THELOGIC BEHINDINFERENTIALSTATISTICS

>U

41

".

d"LL

Populario FIGURE 14-1 Line graphsshowingthe relationship between samples andpopulation, assuming that the treatmenthad no effect on the dependent variable(M, , mean of Sample I; Mz, mean ofSample 2)

>,

".

,"LL

11

11Sample I Sample 2

M M2

Page 3: Bordens and Abbott 2008

420 CHAPTER 14

Eacl\ sample mean provides an independent estimate of the popLilation mean.Each saln it standard error provides an independent estimate of the standard devia-tioi\ of sample means in the sampling distribution of means. Because the two means

ere drawn froin the same population, yo\Iwould expect them to differ only becaLiseOSampingeT h ddd'triofthisdistribution (The standard errors). From this information, you can calculate the prob-h'lit that the two sample means would after as much as or more than they do SImp y

because of chance factors. This probability is the obtained p.Let's review these points. If The treatment had Do effect on the scores, then You

would expect the scores from the two groups to provide Independent samples Tomthe same o ularion. From these samples, you can estimate the characteristics of thatpopuation;Tomt\iseSi ,Y I h herVGddifference between the two treatment means.

C nsiderthe case in which the Treatment does affectthe scores, perhaps by shift-in 11\ein LIPward. Figure 14-I Illustrates this situation. In The LIPper part oft e gurea 0 111ation tinderlying the control group sample disrribution and anot er oneunderI ing the treatment group sample distribution. The population distribution Lin-denying the treatment group Isshifted upward and away from the contro group popu-Iaiion distribution. This shift could be obtained by simply adding a constant to eachvalue in the control group distribution. This new shifted distribution resembles theold Linshifted distriburioi\ in standard deviation, but its mean is higher.

Using Inferential Statistics

FIGURE 14-2 Line graphs showingthe relationship between samples andpopulation, assuming rhat the treatmenthad an effect on the dependent variable.

The bottom parr of the figure shows two possible sample distributions-onefor the control group and one for the Treatment group. The scores from the controlgroup still constitute a sample from the 11nshifted distribution (left-hand upper curvein Figure 14-2), but the scores from the treatment group now constitute a samplefrom the shifted distributioit (right-hand upper curve in Figure 14-2). The two samPIe means provide estimates of two different population means. Because of samplingerror, the two sample Ineans might or might I\or differ evei\ though a difference existsbetween the 11ndeTlying popularioiT means

Your problem (as a researcher) Is that yoLi do not know whether the Treatmentreally had an effect o1\ the scores. YOLi inList decide this based on your observedsample means (which may differ by a certain amount) and The sample standarddeviations. From this information, you Inust decide whether the two sample Ineanswere drawn froin the same populatioiT (the treatment had no effect on the samPIe scores) or from two different populations (the treatment shifted the scoresrelative to scores froin the control group). Inferential statistics help you make thisdecision

These Two possibilities (different or the same populations) calT be viewed as staListical hypotheses to he tested. The hypothesis that the means were drawn from thesame population (I. e. , ^! = IL:) is referred to as the null hypothesis (HD). The hyporhesis that the means were drawn froin different populations (IAI 7E ILz) is called thealternative hypothesis (H, )

Inferential statistics use the characteristics of the two samples to evaluate thevalidity of the null hypothesis. Put another way, they assess the probability that themeans of the Two samples would differ by the observed amount or more if they hadbeen drawn froin the same population of scores. If this probability is sufficiently small(i. e. , if it is very unlikely that two samples this different would be drawn by chancefrom the same population), then the difference between the sample means is said tobe statisticalIy significant, and the null hypothesis is rejected

>

Concrol

group

population

"

rco

LL

Treatment

group

populatio

THELOGIC BEHIND INFERENTIALSTATISTICS

11

>

."

d"

L

Control

group

sample

11

Treatment

group

sample

M

421

Statistical Errors

M

When making a comparison between two sample means, there are two possible statesof affairs (the null hypothesis is true or it Is false) and two possible decisions you canmake (not to reject the null hypothesis or To reject it). In combination, these conditions lead to four possible outcomes, as shown ii\ Table 14.1. The labels across thetop of Table 14-1 indicate the two states of affairs, and those in the left-hand columnindicate the two possible decisions. Each box represents a different combination ofthe two conditions

The lower left-hand box represents the situation in which the null hypothesis IsTrue (the Independent variable had no effect), and you correctly decide riot to rejectthe nullhyporhesis. This is a disappointing outcome, but at least you made the rightdecision

The upper left-hand box represents a more disturbing outcome. Here the ITUllhypothesis is again true, but you have incorrectly decided to reject the null hypothesis. In other words, you decided that your Independent variable had an effect whenin fact it did not. In statistics this mistake is called a Type I error. In signal-detection

-,~"~,~.~ '~""~~$79, ~**,

Page 4: Bordens and Abbott 2008

422 CHAPTER 14 Using Inferential Statistics

TABLE 14-1 Statistical Errors

Decision

ex Griments, the same kind of mistake is called a "{alse alarm" (saying that a slimti LISwas present when actually it was riot).

case, The null hypothesis is false (the independent variable did have an effect), buth correctl decided ITottt)rejectthe nullhypothesis. This is called aType

had no effect whei\ it really did have one. 11\ signal-detection experiments, suc anoutcome is called a "miss" (not detecting a stimulus that was present).Idealy, Youwouie( Tel

error actually increase the probability of a Type U error, and vice versa.

Statistical Significance

Ifboth samples came from the same popularioi\ (or{Torn populations having the samemean), then the ntilllTyporhesisistrue, and any difference between the sample means' flects ITothii\ Inore tl\at\ sampling error. The act11al difference betweei\ your sam-I means inn , be just such a chance difference, or it Inay reflect a Tea ti erence

between the means of the populations from which the samples were Lawn.of these is the case! To help yo\I decide, yoLi cal\ compute ai\ inferentia statistic to

he robabilit of obtaining a difference between sample means as largeas or larger thai\ the difference yoLi actually got, tinder t \e assumptioi\ t at I\e litiarurargert\al\' h, -' thenunhothesisbecause yotiwould he unlikely To have obtaine I\e I erence y

his robabilit , oticalculate ai\ observed value of your inferentia1stic. This observed value is compared to a critical value of that statistic (norma y

f I\d in a statistical table such as those in The Appendix, for examp e, a e

n whether or I\or The observed valtie of the statistic Ineets or exceeds the criticavalue. Asstated, you want to he able to reduce the probability of committing a ypeI error. The probability of committing a Type I error depends on the criterion you LISe

Reject Ho

DONotReject Ho

TRUESTATEOFAFFAIRS

HD True

Type Ierror

HO False

Correct

decision

Correctdecision

Type Uerror

THELOGIC BEHIND INFERENTIALSTATISTICS

c e I or re'ect the nullhypothesis. This criterion, known as the alpha level (or),re resents the probability that a di erence at east as aTge as erepresentstepTOaiiTY h h I'netrOT-The al ha levelrhar You adopt (along with the degrees of freedom) a so eterminesThe critical value of the statistic that you are using. e

I than one T e I error in I million experiments by choosing an a p aI e of .000001. There are good reasons, discussed later, why you o not or inari y

adopt such a conservative alpha level.By convention, t e minimum accepta e a p

Id have roduced a difference at least ariarge as the

A difference between means yielding an observed value of a statistic that meets or ex-d the critical value of Your inferGritial statistic is said to be statistical13 significant.The strategy of looking up the critical value of a statistic in a

the obtained value with this critical value was developed in an era w en

a the exact probability value p along with the obtained value of I e teststatis-providerbeeXaCtPTO" Y I djhlevelandaVOidhaving to use the relevant table. If the o tame p 'comparison is statisticalIy significant.

One'Tailed Versus Two, Tailed Tests" fastatisticde endonsuchfactorsasthenumberofobserva,

ThecriticavaUGS' hd djhalevel. Theyalsode end on whether The testis one'tailed or two-tailed.

igure ' EThemean. Theleftdistri-bution shows the critical region (shaded area) for a one'tailed test, assuming a p ab n set to .05. This region contains 5% of the total area under The curve, represen -the 5% of cases whose z scores occur by chance with a PTO a iity o . or'ri' I are'udedtobestatjstjcallysjgnjficant.

You would conduct a one'tailed test if You were intereste on y inf h statistic falls in one tail of the sampling distribution for that

statistic. This is usually the case when your researc ypot eses ar

423

*

Page 5: Bordens and Abbott 2008

424 CHAPTER 14

>

Using Inferential Statistics

tr

L

One-tailed resr

I I'22\ ,I I'2

FIGURE 14-3 Graphs showing critical regions for one-tailed and two-tailed tests ofstatistical significance

example, you may want to know whether a new therapy is me asurably better than Thestandard one. However, if the new Therapy is not better, then You really do not carewhether it is simply as good as the standard Inethod or is actually worse. You wouldnot use it in either case

In contrast, you would conduct a two-tailed testifyou wanted to know whether thenew therapy was either better orworse than the standard method. In that case, you needto check whether your obtained statistic falls into either tailof the distribution

The major implication of allthis is that for a given alpha levelyou must obtain agreater difference between the means of your two treatment groups to Teach statisticalsignificance ifyou use a two-tailed test than ifyou use a one'railed test. The one'tailedtest is therefore more likely to detect a real difference if one is present (i. e. , it is morepowerf\11). However, using the one'tailed test means giving up any information aboutthe reliability of a difference in the other, untesred direction

The use of one'tailed versus two-tailed tests has been a controversial topic amongstatisticians. Strictly speaking, yoLi must choose which version you will use before You seeThe data. You must base your decision on suchfactoTs as practical considerations (as in thetherapy example), your hypothesis, orpreviousknowledge. Ifyouwaitunrilafteryotihaveseen the data and Then base your decision on the direction of the obtained outcome, youractual probability of falsely rejecting the null hypothesis will be greater than the statedalpha value. YOLihave LISed Information contained in the data to make your decision, butthat information may itselfbe the result of chance processes and unreliable

H you conduct a two-tailed test and then fail to obtain a statisticalIy significantresult, the temptation is to find some excuse why yoti"should have done" a one'tailedtest. YOLi cal\ avoid this temptation ifyoti adopt the following rule of thumb: Alwaysuse a two-tailed test unless there are compelling a priorireasons not to

2

CFicical

region

o

Z

>

1.65

.d"LL

Two-tailed test

Cmcal

regio

1.96

o

Critical

region

Z 1.96

,

Assumptions Underlying a Parametric StatisticThree assumptions underlie parametric inferentialtests(Gravetter 6LWallnau, 2007):(1) The scores have been sampled randomly from the population, (2) the samplingdistribution of the mean is normal, and (3) the within-groups variances are hornoge-neous. Assumption 3 means the variances of the different groups are highly similar.In statistical inference, The independent variable is assumed to affect the mean butnot the variance.

Serious violation of one or more of these assumptions may bias the statistical test.Such bias will lead you to commit a Type I error either more or less often than thestated alpha probability and thus undermine the value of the statistic as aguide to deci-510n making. The effects of violations of these assumptions are examined later in moredetail during a discussion of the statistical technique known as the analysis of"orionce.

Inferential Statistics With Two Samples

Ima me that you have conducted a two-group experiment on whether "death-qualifying" a jury (i. e. , removing any jurors who could not vote forthe death penalty)affects how simulated jurors perceive a criminal defendant. Participants in your ex-perlmental group were death qualified whereas those in your controlgroup were not.Partici ants then rated on a scale from O to 10 the likelihood that the defendant wasguilty as charged of the crime. You run your experiment and then compute a mean oreach group. You find the two means differ from one another (The experimental groupmean is 7.2, and the controlgroup mean is 4.9).

Your means may represent a single population and differ only because of samplingerror. Or your means may Tellably represent two different populations. Your task is todetermine which of these two conditions is true. Is The observed difference betweenmeansreliab!e, or does it merely reflectsampling error!This question can be answeredby applying the appropriate statistical test, which in this case is a t test

Thet Test

The t testis used when Your experiment includes only two levels of the independentvariable (as in the jury example). Special versions of the I test exist for designs involv-ing independentsamples (e. g. , randomized groups) and for those involving correlatedsamples (e. g. , marched-pairs designs and within-subjects designs).

The t Test for Independent Sumples You use the t test for independent sampleswhen you have data from two groups of participants who were assigned at random tothe two groups. The test comes in two versions, depending on the errorterm selected.The Impoo!ed version computes an errorterm based on the standard error of the mean

Tovided separately by each sample. The pooled version computes an errorterm baseon the two samples combined, under the assumption that both samples come fromo ulations having the same variance. The pooled version may be more sensitive to

effect of the independent variable, bur itshould be avoided if there are large dif-forences in sample sizes and standard errors. Under these conditions, The probabilityestimates provided by the pooled version may be misleading.

PARAMETRIC STATISTICS

As noted above, there are two types of inferenrialstatistics: parametric and nonparametric. The Type that you apply to your data depends on the scale of measuremerit LISed and how your data are distributed. This section discusses parametricinferGritial statistics

4*,

PARAMETRIC STATISTICS 425

~.

~."".-.~.,.

Page 6: Bordens and Abbott 2008

I' I

111 11I :

I

'. I^\ I

I ^

I

In

.:

.

.

426

I .^; I

11.

I,

CHAPTER 14 * Using InferentialStatistics

The t Test for Correlated Sqmples When the two means bein cotamp cot at are not in epen ent of one another, the formula forthe ttest in

icipant or ToIn single observations taken on each of a matched air of arti 'designs meet this requirement.

The t test for correlated samples produces a Iar er I value tha tpp ie to I e same data ithe scores from the two samples areatleastmodetj I , Scores Tomtetwosamplesareatleastmoderatelycorrelated, andth' a Pusare

o set y t e correlated sample t test's smaller degrees of freedom Iequal to n - I,

corre are samp us an in GPendent samples I tests (pooled version) are identical;withinreduceddegreesoffreed ,h I 'arelentica;pen entsamp us ttest to detect any effect of the independent variable.

An Example From the Literatures Contrasting Two Gpinal cord injuries (SGI) represent a major source for physical disabilitie H s

vo ve rapi ece eratioi\ o I e body and may resultin Inild traumatic b'njury MTBl). Hess et al. ITore that when a patient with an SGI is rushed i h

wit processing information (Hess et a1. , 2003). The problem is that it is somet'inemotional Iranma associated with SGI.

David Hess, Iennifer Marwitz, and Ieffrey Kreutzer (2003) conducted a ui erentiate etweei\ patients with MTBl(without SGI) and patients with SGI. Par-cipanrs were patients with SGI or MTBl who had been Treated at a medical

ssing attention (two tests), motorspeed, verbal learning, verbal memor (two rests),' uospatia s I s, andwordftuency. Meal\scores werecomputedoneach f

e at, as a rLi e, patients with SClperformed better than

e as emotional well-being.

I -I

I^ II

I 1'I .

11 I*I

II I

I

I I

I I

I .

TABLE 14-2

TEST

,

~*.,.

Written attention test

MotorspeedVerbal learningVerbal memory (immediate recall)Verbal memory (delayed recall)

' "'* ,

Means and t Values From the Five Significant DifferencesFo""d by Her^ at a1, (2003)

, . .

*' .-. .~ ..,. ... ^

As presented, the data in Table 14-2 do nor make much sense. Anthar you haveare means and a t value (with its degrees of freedom) for each measure. You mustdecide ifthe I values are large enough to warrant a conclusion that the observed dif-forences are statisticalIy significant.

After calculating a tscore, you compare its value with a critical value of t foundin Table Z of the Appendix. Before you can evaluate your obtained t value, however,you must obtain the degrees offreedom (for the between-subjects ttest, df = N - Z,where N is the total number of subjects in the experiment).

Once you have obtained the degrees offreedom (these are shown in parenthe-ses in the fourth column of Table 14-2), you compare the obtained Iscore with thetabled critical value, a process requiring two steps. In Table Z of the Appendix, firstread down the column labeled "Degrees of Freedom" and find the number matchingyour degrees of freedom. Next, find the column corresponding to the desired alphalevel(labeled "Alpha Level"). The critical value of t is found at the intersection ofthe degrees of freedom (Tow) and alpha level(column) of your test. If your obtainedtscore is equal to or greater than the tabled CScore, then the difference between yoursample means is statisticalIy significant at the selected alpha level.

In some instances, you may find that the Table you have does not include thedegrees offreedom that you have calculated (e. g. , 44). Ifthis occurs, you can use thenextlower degrees offreedom in the table. With 44 degrees offTeedom, You would usethe entry for 40 degrees offreedom in the table.

If You are conducting your t rests on a computer, most statistical packages willcompute the exactp value forthe test, given the obtained I and degrees offreedom. Inthat case, simply compare your obtainedp values to your chosen alpha level. Ifp is lessthan or equal to alpha, the difference between Your groups is statisticalIy significant atthe stated alpha level.

The z Testforthe Difference Between Two Proportions

In some research, you may have to determine whether two proportions are signifi-cantly different. In a jury simulation in which participants return verdicts of guilty ornot guilty, for example, your dependent variable might be expressed as the proportionof participants who voted guilty. A relatively easy way to analyze data of this type is to

SCl

PARAMETRICSTATISTICS

41.6

91.4

47.1

25.9

21.4

MTBl

30.4

126.1

37.9

187

10.7

t (df)

2.40 (18,

-2.20 (31)

2.40(34)

3.16 (49)

4.73 (44)

427

. .*, ~ ."' ,,. .'.'

. ,.,.*. .., .

J . . ..

..

* *,

Page 7: Bordens and Abbott 2008

428 CHAPTER 14

use a z test for the difference between two ro ortion .essentia by the same as for the I tests. The difference between Ihevaluatedagainstanestimateof Poporrionsjsagainst an estimate of error variance.

Beyond Two Groups: Analysis of Variance (ANOVA)}eil your experiment inclLides Inore ThaiT two Tou s, the star'your experiment inclLides Inore ThaiT two groups, the statistical test of ch ', ariance sthei\amelmplies, ANOVAisbasedontheco-

p aria yzing t e variance that appears in the data. Forthis analysis, the va t'

scri e Tow variation is partitioned into sonrces and how the resultin sovariationsareusedrocjj eTeSLltingsource

riation among means Is statisticalIy significant.

artitioning Vuritttion The value of any particularscore obt ' d ''ecrs experiment is determined by three factors (1) characteristics of the sub'earthetjm, ,h rs: caracteristicsofthesubjecrarthetimethescorewasineasured, (Z)I 'resLijecr

\ w Tel\ a SII. jects are exposed to the same treatment conditions. Sindependent variable is effective.

Figtire14-4showshowtherotalvariarioninrhescoresf -p I itione into Two sources of variability (between-groups variabiliryandwjjhin-groupsvariability). Noticetharth, , I, b g Upsvarja il, ,wit in-groups variability). Notice that the example begins with a total amount of

t' lily among scores. Again, this total amount of variability ina be artrib t bl'ooneorm - fh f ' ''' OvaTiaiitymaybeattributableTooneorjnoreofthreefactors: d d .Y y eattrjut, ,experimenralerror(Gravener&Wallnati, 2007). '

Theftrsrcomponentresulringfromthepartirionisthebt 'e etween-grotips variability may be catised by The variation in our ind d

, Tencesamongt e Ierentsubjectsinyourgroups, byexperi-

FIGURE 14-4 Parririoning Localvariation Into between-groups and within

Using Inferential SLRtistics

groups sources

**

mental error, or by a combinarioiT of these (Graverrer 6< WallnaLi, 2007). The secondcomponent, the within-groups wormbility, Inay be attributed to error. This error can arisefroin either or both of two SOLirces: individual differences between subjects treated alikewithin groLips and experimental error (Gravelter & Wallnati, 2007). Take ITore thatvariability caused by your Treatment effects is Liniqtie to the between-grotips vanahility

The F Ratio The statistic LISed ii\ ANOVA to deterIn ine statistical significance Isthe F ratio. The F ratio is simply The ratio of between-groups variahility to withingroups variability. Both types of variahility that constitute the ratio are expressedas variances. (Chapter 13 described the variance as a Ineasure of spread. ) Howeverstatisticians perversely Insist o1\ calling the variance the mean square perhaps becausethe Tenri is more descriptive. ItISI as with The I statistic, once yon l\ave obtained youIF ratio, yoti compare it against a table of critical values to determine whether YouIresults are statisticalIy significant

The One'Factor Between-Subjects ANOVA

The one'factor between-subjects ANCVA is LISed when your experiment Includesonly one factor (will\ Two or Inure levels) and has different subjects in eacl\ experimental condition. As ai\ example, imagine yotil\ave conducted an experiment onhow well participants calT detect a signal against a background of noise, measuredin decibels (dh). Participants were exposed to different levels of background ITUise(no IToise, 20 db, LIT 40 ab) and asked To indicate whether or ITt)I they Iteard a toneThe I\umber of times that The participant correctly stared That a tone was presentrepresents yonr dependent variable. Yotiftiund that participants in the I\0-noise groLipderecred Inure of The tones (364) Than participants in either the 20-d}) (238) o140-db (160) groups. Table 14-3 shows the distributions for the three groups

Submitting your data to a one'factor between-subjects ANOVA, yoLitibtaii\ anF ratio of 48.91. This F ratio is now compared with the appropriate critical value ofFin Tables 3A and 3B in the Appendix. To find the critical\, alue, yotii\eed to LISe thedegrees of freedQin for bon\ the I\Limerator (1< - I, where k is the ITUmbeT of groups)and denominator Ik(s - I), where s is the number of subjects in each groupl of youIF ratio. In this case, the degrees of freedom for the I\umeTaror and denominator are Zand 12, respective I\

To identify the appropriate critical value for F (at or = .05), first locate Theappropriate degrees of freedom for the ITUmerator across the top of Table 3A. Thenread doWIT the left-hand CUIumi\ to find The degrees of freedQin for the denominator. In This example, the critical value for F(Z, 12) at u = .05 is 3.89. Because youIobtained F ratio is greater thai\ the tabled \, alue, yoLi have ai\ effect significant atp < 05 . In fact, ifyotilook at the critical value for F(2, 12) at or = .01 (found in Table3B), you will find Your obtained F ratio is also significant at p < .01

When yoLireporr a significant effect, typically yoLi express it in terms of ap valueAlpha refers To the cutoff point that yon adopt. In contrast, the p value refers to theactual probability of making a Type I error given that the ITUll hypothesis is TTLieHence, for This example, yott would report that yotir finding was significant at p < .05or p < .01. The disctissioiT 11\ the following sections assumes the "p < " ITotation

Total

variation

~"

PARAMETRIC STATISTICS

J.

Between

groups

variability

Within

groups

variability

4

Page 8: Bordens and Abbott 2008

430 CHAPTER 14 Using Inferenrial Statistics

TABLE 13-3 Data From Hypothetical Signal-DetectionStudy

NONOISE

33

39

41

32

37

12X

Exz

M

200ECIBELS

Sometimes the table of the critical vanies of F does ITor 11st the exact degrees offreedom for your denominator. Ifthis happens, you can approximate the critical valueofF by choosing the nextlower degrees offreedom forthe denominator in the tableChoosing This lower value provides a more conservative test of your F ratio

Interpreting Your F Rqtio A significant F ratio tells yotithat at leastsome of thedifferences among your Ineans are probably nor caused by chance burrather by variarion in your independent variable. The only problem, at this point, is that The Fratio fails to tell you where among the possible comparisons the reliable differencesactually occur. To isolate which means differ significantly, yotiinust conduct specificcomparisons between pairs of means. These comparisons can be either planned orunplanned

Planned Coinpqrtsons Planned comparisons (also known as a pTioTicompttTisons)are used when you have specific preexperimental hypotheses. For example, you mayhave hyporhesized that the no-noise group would differ from the 40-db group but notfroin the 20-db group. In this case, yotiwould compare the no-noise and 40-db groupsand then the no-noise and 20-db grotips. These comparisons are made using informaLion from Your overall ANOVA (see KGppe1, 1982). Separate F ratios (each havingI degree of freedom) or I tests are computed for each pair of means. The resultingF ratios are then compared with the critical values of F in Tables 3A and 3B in theAppendix

You can conduct as many of these planned comparisons as necessary. Howeveia limited number of such comparisons yield unique Information. For example, if youfound that the no-noise and 20-db groups did not differ significantly and that the40- and 20-db groups did, you have no reason to compare the no-noise and 40-dbgroups. You can logically infer that the ITo-noise and 40-db groups differ significantlyThose comparisons that yield new information are known as orthogonal comparisons

182

6 684

36.4

22

24

25

21

27

400ECIBELS

119

2 855

23.8

17

14

19

11

19

80

I 328

16.0

where k is the number ofAny set of means has (k - I) orthogonal comparisons

d in atISOns can be LISed in lieu of an overallANOVA tryou have highlySpecificpreexpeTimGnayp ' 11 tiveiSto

d ICt multi\Ie Itests. Yotishould not perform Too Inariy of these comparisons even'f th relationshi s were predicted before yoLi conducted your experiment. Per orm-It' re tests on the same data increases the probability of Inaking a ype erroracross comparisons through a process cane PTO a tit3! pynextsection)

its If otido not have a specific preexperimentalhypoth-Unplanned Coinporisons Ifyotido not have a specific preexperimenta ypot -oncernin OUTresults, yoLimust conduct unplanned comparisons (a so Down

as OSI hoc comparisons). Unplanned comparisons are often "fishing expeditions"',hj, ,otjaresimpyOO g Y b dt erformafaiTlyjar e number of unplanned comparisons To fLilly analyze the data.

wotypesO Ilealhaforeach

arison between means. Ifyotiset an alpha level of .05, the per-comparison errrate is .05. The familywise error Tare (KGppe1, 1982) takes into account the increasing,, oaiityoina g b t{withthefollowingformula

PARAMETRICSTATISTICS

uru, I_ ,, _ or,

where, tstenu"' ' ( =4)andu=. 05, then or, w'I - (I - .05)* = I - .95* = I - .815 = .185

S ecialtests can be applied to controlfamilywise error, but it is beyond t e scopef h' cha ter to discuss each of them individually. Table 14-4 lists The rests mostf ed to controlfamilywise error and gives a briefdescription ofeac . or moreoften LISed to controlfamilywise error and gives a bTie escription o eac .

information about These tests, see KGppe1(1982, chap. 8).

'11 use an ANOVA if Your groups contain unequal num-Sumple Size You can still use an ANOVA if Your groups contain unequa num-b ifstib'ects, but yoLiinust use adjusted computational formulas. T e a justmen scan take one of two forms, depending on the reasons for Linequa wit In-

b - roduct of the way that You conductednt. If oLi conducted your experiment by randomly distributing yourour ex eriment. If yoLi conducted your experiment by ran Qin y istri uting yYourexperiment. IfyoLiconLicteyour P I

I. I SLich cases, unequal sample sizes do not result from the properties o yourequal. In SLich cases, unequa samp e streatmentconditions. fU tialsam Ie sizes also may resultfrom the effects of your treatments. onef treatments is painful or stressftil, participants may drop out of your experimen

b Ise of The aversive nature of that treatment. Death of animalsin a grotip receiving

42

Page 9: Bordens and Abbott 2008

432 CHAPTER 14

TABLE 14-4 Post HocTests

TEST

Using Inferential Statistics

Scheff6 rest

Dunnett test

USE

TokeepfamilywiseeTror Veryconservativetest;Scheff6correction factor corrects for allrate constant regardlesspossible comparisons, even if norof the number of coin

allare madeparisons to be madeTo contrastseveralexperi- Not as conservative asrhe Scheff6mental groups with a single rest because only the number of

comparisons made is considered incontrolgroupthe familywise error rare correctionNot as conservative as the Scheff6

test for comparisons between pairsof means;less powerful than theSchef{6 for more complexcomparisons

Not as conservative as Tukey'sHSD test hut more conservative

than the Newman-Keulstest

Less conservative than the Tukeyrest; critical value varies accordingto the number of comparisons made

Controls familywise error betterthan the Newman-Keulstesrbut

is less powerful Than theNewman-KGulstest

Tukey. a HSD rest

Tukey-b VsD test

COMMENTS

To hold the familywiseerrorrate constant over

an entire set of two-groupcomparisons

Newman-Keulstest

Ryan's Test(REGWQ)

Alternative Tukey Test

To compare allpossiblepairs of means and controlper-comparison errorrate

Modified Newman-Keulstest in which criticalvalues decrease as the

range between the highesr and lowest means

decreases

To compare allpossiblepairs of means

Duncan test

Fisher test

f I riditionsis another example of subjectloss re ateI ani ulations That result in unequal salnp e sizes.

a u with Linequalsample sizes for reasons

notreiatedtotheeffecjsofyouTtreatmel\ , , ,h, ,, discarding

the ANOVA. This analysis gives each gTotip in yotSI n equal weight in the aria ysis, espite LID

Ie sizes was planned or reflects

actualdifferences 11\thepopLiarion, VOtlS .\ d, ,, ording to th'

,, mber of SLiiGC"in"" ' h domeanswjrhlowerweights. See

Kepp. I(1973,1982) orc"areIte"an a ,ualsample size in ANOVA.

Th O e, Factor Within-Subjects ANOVAT ex eTiment, The statistical test

to LISetstheone-factoTwithin-subjectsANOVA- 'in' 1, f he inae ende"to, ,,, 1, the one'fac'""""""' ff, ,, db the levelofthe independent

,,,, tmentsSUm' q f, ,, l, nce(s)als\ICanbepaTtjtjOnedlntotwo ac,T U -The within, subjectssouTceofvaTiance(s)als\ICanbePaTt""' ,

hesametreatment)andexperimentaeTTOT- b-,, IsasafaCTOT

in tanalsis(S). YOLitlTensubrractSfromtheLisualwitl\in-groupsva"' ,in I\eanajsjs(S). YOLjtjTensubrractSfromtheLisualWit\in'g' I ki, , thein the analysis(S). YOLitlTensu Tract it '. . , I, ,F, ,, j, ,IhLISmaking"'in, beanalYS"(' """"' he den minatoToftl\eFratio, IhLIS making theFratioinoresensitivetothee ectso tTern GP , ,

desi ITS are LISed to counterbalance the

order 11TwhiCl\Subj'''""" h L, t, ,\sqtjaredesigl\renal'

theLatinsquaTeANOVA, see eppe , -I veTall F ratio tells You that significant

's n our Ineans, but, netisLial, It does I\orte youdiff, ,ences exist am''gV ' h- h ,,,, differ, yotjjnustftjTtheranayze'thantaifferencesoccuT. To determinew it Ineai ,

A conservative test is one with which it is more difficult co achieve statistical significancethan with a less conservative test. "Power" refers to the ability of a test to reject the nullhypothesis when the null hypothesis Is falseSOURCE:Information in this table was summarized from Keppe1, 1982, pp. 153-159; Pagan0, 2007Winer, 1971; and information found at http://WWW2. chass. ncsu. edu/garson/pa765/an ova. htm

To compare allpossiblecombinations of means

Coinpured in the same way antheNewman-Keulstest with more

than two means to be comparedit is less conservative than theNewman-Keuls

Powerful testrhar does not over

compensate to controlfamilywiseerror rate; no special correctionfactor used; significant overallF ratio justifies comparisons

PARAMETRIC STATISTICS 433

Page 10: Bordens and Abbott 2008

434 CHAPTER 14 , Using InferenrialStatisrics

your data. The tests used to coin arebetwee, b I GSimiartoroseLisedinthe

The Two-Factor Between-Subjects ANOVAChapter 10 discussed The two-factor between-sub'ects des'

e Two in epen Grit variables and randomly assign different subjects to eachCond'I'. I dd myassign letentsubjectstoeach

e ect o t e two factors (interaction) o1\ the dependent variable. (Ifyou are unclearaboutrheme fh "e- youareuncleara our The meanings of these Terms, review Chapter 10. ) The anal SIS a To ri re

Statistical"fi fb CauseiTinustetermjnethe

MumEffects and InterQctions Ifyoufind bothsignificanr am effects rid ii rc Ions in your experiment, you must be careful about interpreting the main effect .

as ai\ effect o1T the dependent variable, regardless of the levelof o hI e GPen Grit varia e, regardless of the levelofyour other inde-

The interaction shows that ITeirher of your inde endenr .' blect. onsequenr y, you should avoid interpreting main effects when an

Interaction is present.

You should also be aware that certaiiT kinds of 'ain e ects. e in epeiT ent variables may have been effective, and errhe star' -

ICa aria ysis will failto revealstatistically significant Inain effects for. th fI an to revea statistica y significant Inain effects for' these factors.

igure 14-5 s ows the cell means for this hypotheticalexperiment.

The diagonal lines depictthe functional relationshi b ,ria e at t e two levels of Factor B. The fact That the lines form ai\ X

(rather than being parallel) indicates the presence of an interaction. Notice th F'I an eing para el) indicates the presence of an interaction. Notice that Fac-y a ects t e eve o t e ependenr variable at both levels of Factor B but

that these effects run in opposite directions.The dashed line in Figure 14-5 represents the main effecr ofFa I A,

u per an owerpoints to co apse acrossthe levels of Factor B. This dashedlinershorizonral, indicar' h h "Or is us, d, in ICaring t arthere is no change in the dependent variable across the

GPen ent vana e at each level of Factor B, Its average (main) effect is zero.Logically, If the interaction of two variables is significant, then the t

Onsequent y, I you ave a SignificantinteTaCtiOn,ignoreLhem' ff .Th I' Y aVGaSignicantinteractjon,or not the main effects are statisticalIy significant.

Finally, most of the rime you are more Interested iI an in main effects, eveiT before your experiment Is conducted. H orhesized I-

'ps among variables are often stared in terms of Interactions. Interactions t d

,

~

}

".."Q."

.

=,

\

------- -------

o"L

o,

LA

Level I

Main effect

FactorA

FIGURE 14-5 Graph showing a twoway Interaction that masks main effects

to be inherently more interesting than main effects. They show how changes in onevariable alter the effects on behavior of other variables

Sample Size just as with a one'factor ANOVA, yoti can compute a inulrifactorANOVA with unequal sample sizes. The tinweighred Ineans analysis can be conducted on a design with two or more factors (the logic is the same). For details onmodifications to the basic two-factor ANOVA formulas for weighted means and unweighted Ineans analyses, see Keppe1(1973, 1982)

ANOVAforaTwo-FoctorBetween"Subjects Design: An Example An experimentconducted by Doris Chang and Stanley Sue (2003) provides an excellent example ofthe application of ANOVA to the analysis of data from a two-factor experimentChang and Sue were interested in investigating how the race of a student affected ateacher's assessments of the student's behavior and whether those assessments werespecific to certain types of issues. Teachers (163 women and 34 men) completed asurvey on which they were asked to evaluate the behavior of three hypothetical children. Each survey included a photograph of either an Asian-American, an AfricanAmerican, or a Caucasian child. The survey also included a short description ofthe child's behavior. The child's behavior was depicted as falling into one of threeproblem" types: (1) "overcontrolled" (anxious to please and afraid of Inaking

mistakes), (2) "undercontrolled" (alsobedient, disruptive, and easily frustrated), or(3) "normal" (generally follows rtiles, fidgets only occasionally, etc. ). These two variables comprise The two independent variables in a 3 (race of child) X 3 (PTObleintype) factorial design. The SLITvey also included several measures on which teachersevaluated The child's behavior (e. g. , seriousness, how typical the behavior was, attributions for the caLises of The behavior, and academic pe"formance)

We limit our discussion of the results to one of The dependent variables: Typicalicyof the behavior. The data were analyzed with a two, factor ANCVA. The resultssh, wad , signifi. ant main atect of p"obl. in Type, F(Z, 368) = 46.19, p < .0001Normal behavior (M = 6.10) was seen as more typical than either undercontrolled

~

,

Factor A

PARAMETRICSTATISTICS

Level 2

Factor B

Level I

Level 2

,^.

Page 11: Bordens and Abbott 2008

I, ,*4,,*

,*

.~ ., . - "..

J, *,.--,... * ......,., j..~,.**. " -*,

436

.

...,., -. -. ... , , ,...

.. . .,..,.-..\

CHAPTER 14 * Using InferentialStaristics

(M = 4.08) or one'cont, Quad (M = 4.34) bebanjo, . The ANOVA al, . chow. d a ,re-tistically significant race by problem-type interaction, F(4, 368) = 7.37, p < .0001

Interpreting the Results This example shows how to interpret The results froma two-factor ANOVA. First, consider the Two mall\ effects. There was a significanteffect of problem type on typicality ratings. Normal behavior was rated as Inore tvpi-calthan overcontrolled or undercontrolled behavior. If this were The only significanteffect, you could then conclude that race of the child had no effect o1\ typicalityratings because the mail\ effect of race was I\ot statisticalIy significant. However,this conclusion is nor warranted because of The presence of a significant interactionbetween race of learner and problem type.

The presence of a significant interaction suggests that the relationship betweenthe two independent variables and your dependent variable is complex. Figure 14-6shows the data contributing to the significant interaction in The Chang and Sue(2003) experiment. Analyzing a significant interaction like this one involves Inakingcomparisons among The meansinvolved

Because Chang and Sue (2003) predicted the interaction, they LISed plannedcomparisons Orests) to contrast the relevantineans. The results showed that the TVpi-caliry of the Asian-American child's behaviorwas evaluated very differently froin thatof the Caucasian child and African-American child. Teacherssaw the ITormalbehav-ior of the Asian-AmericaiT child aslessrypicalrhan the ITormalbehavior of either TheCaucasian or African-American child. Teachers saw the overcontrolled behavior bythe Asian-American child as more typicalrhaiT the same behavior attributed to theAfrican-American or Caucasian child. The undercontrolled behavior wasseeiT aslesstypical for The Asian-American child than forthe African-American and Caucasianchildren, respectively. So the race of the child did affect how participants rated Thetypicality of a behavior, butthe nature of that effect depended on the Type of behaviorattributed to the child.

,I

.!

=I

.... ...~. ....:.

,

,

..-. ~.~ . .- . .....,..;::.,^ *

~,"

, . .,,.;., ....

..

8

7

' , 6

aE 5E :g8 " 4a~ "

.""L

, .

The Two-Factor Within-Subjects ANOVA

Allsubjects it\ a within-subjects desigiT will\ two factors are exposed tt) every possiblecombinatioi\ of levels of your two Independent variables. These designs are analyzedusing a two-IncloT witliin-subjects ANOVA. This anal\, SIS applies the same logic devel-uped for the one'factor within-SIIhjccts ANCVA. As in the one'factor case, subjectsare treated as a factor along with your manipulated independent variables.

The major difference herweei\ the one' and two-factor within-subject ANOVAis That yotiintist consider the interaction between eacl\ of yotn'independent variablesand the SIIhjects factor (A X S and B X S), in additioi\ tti the Interaction betweenyour Independent variables (A X B). Because the Itasic logic and interpretation ttfresults from a within-subjects ANOVA are essentially the same as for The between-subjects ANOVA, a complete example isn't given here. A complete example of thetwo-factor within-SLiblecrs ANOVA can be found in Kcppe1(1973)

Mixed Designs

In some situations, your researcl\ may call for a design I\\ixing between-SIIbjecrs andwithin-subjects components. This desigi\ was discLissed briefly in Chapter 11. If YouLISe such a desioi\ (knowi\ as a mixecl or split-1,101 design), y. ti cal\ analyze your datawith an ANCVA. The computations Invt)Ive calcularing sums of squares for thebetween factor and for the within factor

The most complex parr of the analysis Is the selectioi\ of ai\ error Ierin to calcu-late the F ratios. The within-grotips Incai\ sqLiare Is LIScd to calculate the between-subjects F whereas The interactioi\ of the withii\ factLir witl\ the within-groupsvariance is LISed to evalLiate built the within-subjects factor and the interactionbetween the within-subjects and between-subjects factors. Keppe1 (1973, 1982)provides an excellent discussioi\ of this analysis and a complete worked example.

Higher-order and Special-Case ANOVAs

Variations of ANOVA exist for lust ribotit any desigi\ LISed in research. For example,yoti cal\ include three or f<IUT factors ii\ a single experiment and analyze the data witha higher-ordeT ANCVA. In a three-factor ANOVA, for example, yoLi can test threemail\ effects (A, B, and C), three two-way Interactions (AB, AC, and BC), and athree-way interactioi\ (ABC). As yoLi add factors, ITUwever, the coinptitations becomemore complex and probably SITould ITUt he done by I\and. In addition, as disctisse\I inChapter 10, It Inay EC difficult to interpret the ITigher-order interactions will\ InureThaiT {oLIT factors

A special ANOVA is LISed \vhei\ y. ti have incltidcd a continuous correlationalvariable in Your experiment (such as age). This type of ANOVA, called The analy-sis of covariance (ANCOVA), allows yotito examine the relationship I\etween ex-perlmentally Inariipulated variables while controlling another \, ariable that may Itecorrelated with them. Keppe1(1973, 1982) pro\. Ides clear discussions of these analysesand other issues relating to ANCOVA

To summarize. ANOVA Is a powerfLil parametric statistic LISed to analyze one'factor experiments (either within-subjects or between-subjects) with Inure Than two

~ 0 3->

2^,

o

. -..~,

,,

a',

*..,, t*,' '

,.*

\\ \,.,,~,^a*. *~

..,. --

. , ,:,;

Problem type

FIGURE 14-6 Graph showing an interaction between I'ace and problem typeSOURCE: Chang and Sue, 2003; reprinted with permission

.^. . .\.. ,..

~ .,

Normal

~. - .

Overcontrolled

*. ,"' .:*.' F. ,

.t~ Asian-American

'a. African-American

.*', Caucasian

.. .

PARAMETRIC STATISTICS

,

Under on trolled

.".. ... .

...,

I ,

Page 12: Bordens and Abbott 2008

"

-:*

.

438 CHAPTER 14 * Using InferentialStatisrics

treatments and to analyze Inulrifacror experiments. IT Is intended for use wheiT yourdependent variable is scaled on at least ai\ intervalscale. The assumptions that applto the LISe of parametric statistics in general (such as homogeneity of variance andnormally distributed sampling distribution) apply to ANOVA.

ANOVA involves forIn ing a ratio between the variance catised by your inaepen-dent variable PItis experimental error and the variance (mean square) caused by experi-mental error alone. The resulting score is called an F ratio. A significant F ratio tells youThat at least one of yoLir Ineans differs from the other means. Once a significant effectis found, you theIT perforin more detailed analyses of the Ineans contributing to the SIg-rimcanr effect in order to determine where the significant differences occur. These testsbecome more complicated as the design of your experiment becomes more complex.

NONPARAMETRIC STATISTICS

Thus far, this discussion has centered on parametric statistical rests. In some SILLia-lions, however, yon inay not be able to LISe a paramerric test. WheiT your data do normeetrhe assumptions of a parametric test or whei\ your dependent variable wasscaledo1T a nominal or ordinal scale, consider a ITonparametric rest. This sectioiT discussesThree ITonparametric rests: chi-square, the Mann-Whimey U rest, and the Wilcoxonsigned-ranks rest. Yotiinight consider using Inariy other nonparametric tests. For acomplete descriprioi\ of these, see Siegel and CastellaiT (1988). Table 14-5 summarizessome Information on these and other nonparamerric rests

Chi-Square

When your dependent variable Is a dichotomous decision (such as yesjno or gtiiltyjnotgtiilry) or a frequency count (such as how many people voted for Candidate A andhow many for Candidare B), The statistic of choice is chi-square (X'). Versions ofchi-square exist for studies with one and two variables. This discussion is limited tothe two-variable case. For further information o1\ the one'variable analysis, see eitherSiegel and Castellai\ (1988) or RDScoe (1975).

Chi-Square for Contingency Tables Chi-sqtiare for contingency tables (also calledthe chi-sqi{aTe test for independence) is designed for frequency data in which the Tela-Lionship, or contingency, between Two variables is to be determined. In a voter prefer-errce study, for example, yoti might have liteasured sex of respondent in additioi\ tocandidate preference. You may wantto know whether the two variables are related orIndependent. The chi-square test for contingency Tables compares your observed cellfrequencies (Those you obtained in your study) with the expected cellfTeqi{encies (thoseyou would expect to find itchance alone were operating).

A study reported by Herbert Harari, OreiT Harari, and Robert White (1985) pro-vides aiT excellent example of the application of the chi-square test to the analysis offrequency data. Harari et al. investigated whether male participants would help thevicriin of a simulated rape. Previous research on helping behaviorsuggested that indi-viduals are less likely to help someone in distress If they are with others than if they

*

*;

.}t

9

,

*,

,

.,

,,

. 4 t,,

^.

!*.,*;;!$*.

.,.

.,

.* ,

,

.

,

,

:.

.,I, -

%.:I*.

14;,,.

TABLE 14-5

. .

TEST

Noriparametric Tests

Binomial

Chi-square

Kolmogorov-Sinimov

MINIMUMSCALEOFMEASUREMENT

Chi-square

Fisher exact probability

Kolmogorov-Sinirnov

Wald-Wolfowitzruns

Moses test of extremereactions

Randomization rest

NONPARAMETRIC STATISTICS

one'sample Tests

Nominal

Nominal

Ordinal

Two ladepet, dent Samples

coA, I\, IENTs

Nominal

Nominal

Ordinal

Ordinal

Ordinal

Interval

Can be used as a more powerfulalternative to chi-square

Mann-Whitney U

MCNemar

Alternative to chi-square whenexpected frequencies are smallMore powerful than theMann-Whitney U test

Less powerful thanMann-Whitney U test

Teststhe difference betweenmeans without assuming normalityof data orhomogeneity of varianceGood alternative to Itest whenassumptions violated

Sign

Wilcoxon matched pairs

Walsh rest

Ordinal orabove

. ...

Two Reluted Samples

Randomization testformatched pairs

Nominal

Ordinal

Ordinal

Interval

..*^.

Good test when you have abefore-after hypothesisGood when quantitative measuresare not possible, but you can rankdata

Good alternative to Itest whennormality assumption is violatedGood nonparametric alternativeto the t test; data must bedistributed symmetricalIy

Interval

co"tin"es

Page 13: Bordens and Abbott 2008

**..,

440

tv 31,

CHAPTER 14 , Using InferentialStatisric

TABLE 14-5

I

...

,.*'

,.

,. ..

TEST

..

*

,...

NoriparametricTests co"tin"ed

Cochran Q test

Friedman two-wayANOVA

MINIMUMSCALEOFMEASUREMENT

Chi-square

Kruskal-Wallis one-wayANOVA

More Thon Two Reloted Samples

OURCE: Data from Roscoe, 1975, and Siegel and Castellan. 1988

arealone. Hararietal. conducted antidinvestiati '.'Ing a one or in noriinteracring groLips) were exposed to a mock in re (a malee o t e experimenters grabs a female confederate and dra s I\er

s I e reqtiencies of participants I}elping Linder the two condirTom a c i'square test performed on These data showed a si. rim , "I. -lionshjpberweenthedecisioiTroofferhj d h ' g icanjre, .ions ip erween the decisioiT roofferhelp andwhetherpartici antsw II cuayinoreieytoTejpthajtthosewhc)were

linttutions of Chi-Square A problem arises If any of your ex ected cell fre, Lien-(Graverrer6{Walln, ,, 2007). Y' '''q""'naYheartificiallyjnftat, d(Gin, er, ere*wall, an, 2007). Y, ,, h, . h ' ""' "' 'in ""

Iact pro a lity test (see Roscoe, 1975, or Siegel ,* Castellan, 1988) is anTingencytable(Roscoe, 1975) ' con-

signi cantchi-square tellsyotirhatyourrwovariables aresi n 11 I Iexamp e, a yon now Is That group size and helpino are related. AwithANOVA, however, chi, -d Psarereare. A,

an two categories o each variable exist. To determine the locus

thecontingencytable ceso

Nominal

More Than Twolndependent Samples

Ordinal

Most useful when data fallinronatural dichotomous categories

Nominal

Ordinal

COMMENTS

Good alternative to a one'factorANOVA when assumptions areviolated

-*.... .*.~...,

'.*..

IJ, ,._.,,. . .," ..,'.' ' '~.i

TABLE 14-6

*,*,*

Number of Participants Helping Mock Rape Victim,in Two Conditions

,,..,. ....... ~' 'fLL. ,\. ,..,. .., A .., ~"'

PARTICIPANTSINGROUPS

PARTICIPANTSALONE

SOURCE: Data from Hatari, Hatari, and White, 1985

The Mann-Whitney UTest

A h ' owerfuliTonpaTainetric test is The Mann-Whitney U test. The Mann-Whime . tcst cal\ be LISed whci\ y. tn' dependent \, ariable is scaled on at east anordinal scale. 11 is alsLt a good alternative to the I test whei\ yotir data do ITot Ineel t e, . intrions of the I rest (sucl\ as \\, heri the scores are I\or normally distributed, w enthe variances are heterogeneous, \IT whei\ yoLi have smallsamplc sizes).

CalculatioiT of the Mani\-Whimey . tcstisfairly simple. The firststep tsu> coin-bine the data from your two grotips. Scores are ranked (froin highest to lowest) an\I, bele I accordin, to the group to \\, hich they belong. H there Is a difference .etween

Yourgroups, ' --ILld'tributed. AUScoreisalculated for each group in your experiment. The lower of the Iw!I . scores o lainetheIT evaluated against critical\, alues of U. If the lower of the Iwt\ U scores is sintt eT

11 ai\ the tabled U value, yon theiT conclude your two groups differ significant y

The Wilcoxon Signed Ranks Test

If I conducted a single-factor experiment LISing a correlated-samples (relatet) ort hed-tails desi, n, the Wilcoxon signed ranks test would he a good statistic to

analy, eyour ata- *"""' ' I, ,,,,,, ked(disre. archngthesign of the difference score) from smallest to largest. Next, each rank is assigned a

'live or I\e alive sign, depending o1\ whether the difference score was positive ornegative. The positive and I\egative Tan s arc t ei\ summeL.negai P I 11. ettial. However, ifthe

f h isillve and ne alive ranks are very different, theI\ the nullhypothesisai\ Ite re'ected. For more informal101\ o1\ the Wilcoxoi\ signed ranks test, see iege

and Castellan (1988).

Parametric Versus Nonparametric Statistics

N\IT arametric statistics are LISeful when yoLir data do ITot meet the assumptionsf atametric statistics. If yoLi have a choice. cl\QOSe a parametric statistic over a

NONPARAMETRIC STATISTICS

*

INTERVENED

34

J

26

DIDNOTINTERVENE

60

..". *~,.* .~- "-~ ,...*. - .. ~

6

14

20

40

40

..,,

"-^

Page 14: Bordens and Abbott 2008

\

442 CHAPTER 14 , Using InferentialStaristics

nonparametric one becaLise parametric statistics are generally more powerful. That ,ua y provi us a more sensitive rest of the null hypothesis

secon problein with noriparametric statistics is that appropriate versionsOuhjd I q Y, Wenesigningyourstudy,

parametric statistic calT be used.

SPECIALTOPICSIN INFERENTIAL STATISTICS

The application of The appropriare Inferentialstatisric ina a coforward. Howe , If YPP"'SimPeansrraight,amerric or nonparametric statistic, when using any inferential statistic. This sectioniscusses some special topics to consider when deciding on a strategy to statisticalIevaluated, ,,. ,

Power of a Statistical Test

Inferentialstatisrics are designed to help you determine The validiOrhes's. C I ITeTevaiityotenullhy-are inconsistent with The nullhyporhesis. The power of a statistical test is its ab'I'erect t ese differences. Put in statistical terms, power is a statistic's abilit to corre 11

reject the null hypothesis (Gravetrer Is{ Wallnau, 2007). A powerful statistic is moresensitive to differences in your data Than a less powerful one'

The issue of the power of your statistical rest is an jin ortant onti ypor esis imp ies that your independent variable affected your dependent

to reject the null hypothesis Is not caused by a lack of power in our statistical re I.e power of yourstatisricaltest is affected by your chosen alpha level, the size ofduced '

Alpha Le"el As you reduce your alpha level(e. g. , froin .05 to .01), 0Lireduc hep a coe As you reduce your alpha level(e. g. , froin .05 to .01), yoLireduce theprobability of making a Type I error. Adopting a more conservative al ha Ie I kit more ithculr to reject the null hypothesis. Unfortunately, it also reduce

Iven a constant error variance, a larger difference between means is re uired Iobtain statistical significance with a more conservative alpha level.Sample Size The power of ourstatisticalIs ICa test Increases wit I e size of your sample

particu ar, the standard errors of the means from your treatments will be lower, so IhPOSiions o t e popu ation means fall within narrower bounds. Consequentl,

hypothesis when it is false.,

SPECIALTOPICS IN INFERENTIALSTATISTICS

One' Tailed VetsMS Two, Toiled Tests A two, tailed restisless powerful than a one'tailed test. This can be easily demonstrated by looking at the critical values oftfoundin Table 2 in the Appendix. At 20 degrees offTeedom, the critical value at u = .05 fora one' tailed test is 1.73. For a two-tailed test, the critical value is 2.09. It is thus easierto Teect the ITUll hypothesis with the one' tailed test than with the two-tailed test.

Effect Size The degree to which the manipulation of your independent variablechan us the value of the dependent variable is Termed the effect size. To facilitatecomparison across variables and experiments, effect size is usually reported as a PTO-ortion of the variation in scores within the treatments under comparison; for exam-Ie, the effecrsize for the difference betweei\ two treatment means might be reported

as (Mz - Mj)/s, where s is the pooled sample standard deviation (Cohen, 1988).Measured in this way, effect size estimates the amount of overlap between the two

o ulation distributions from which the samples were drawn. Large effect sizes in-d'cate relativeI little overlap: The mean of Population Z lies far into one tail of thedistribution of Population I, so a Teal difference in population means is likely to bedetected in the inferentialtest (good power). Small effect sizes indicate great overlapin the o Ination distributions and Thus, everything else being equal, relatively littleower. However, because inferGrillaltesrs rely on the sampling distribution of the test

statistic rather thai\ the population distributions, you may be able to improve powerin such cases by, for example, increasing the sample size.

Determining Power Because the business of inferentialstatistics is to allow you todecide whether or not to reject the null hypothesis, the issue of power is important.You want to be reasonably sure that your decision is correct. Failure to achieve sta-tisticalsignificance in your experiment (thus not rejecting the null hypothesis) canbe caused by many factors. Your independent variable actually may have Do effect,

o r ex eriment may have been carried outso poorly that the effect was buried invariance. Or maybe your statistic simply was riot powerful enough to detect the

difference, or you did not use enough subjects.Although alpha (the probability of rejecting the null hypothesis when it is True)

can be set directly, it is not so easy to determine what the power of Your analysis wibe. However, you can work backward from a desired amount of power to estimate t esample sizes reqtiired for a study. To calculate these estimates, you must be wiling tostate The amount of power required, the magnitude of the difference that You expectto find in Your experiment, and the expected error variance.

The expected difference between means and the expected error variance can eestimated from pilotreseaTch, from theory, or froin previous research in your area. Forexam Ie, if revious research has found a small effect of your independent varia e(. g. ,Zp. ints), you. anuseT rs, Saner' bjjh. ThreinOa Teed-on acceptable or desirable levelofpower (Keppe1, 1982). Ifyou are willing andable to specify the values mentioned, however, you can estimate the size of the samp eneeded to detect differences of a given magnitude in your research. (See Gravetter6< Wallnau, 2007, or KGppe1, 1982, for a discussion on how to estimate the requiresample size. )

,,

,.

."*,"

.,

..., ,. .*.. .,' .,., . ., ..,. ,., - .... ,

^ ...~... .. .,

441

A . ~ ,.-., ,, ,~.,"

*,. .-,,

,.

Page 15: Bordens and Abbott 2008

444 CHAPTER 14

Too much power can be as bad as too little. If you Tai\ enotigl\ subjects, yoti couldconceivably find statistical significance in even the InOSI minute and trivial of diffeiences. Similarly, when you use a correlation, you can achieve statistical significanceeven with small correlations tryou include enough subjects. Consequently, yoursamPIG should be large enough to be sensitive to differences between treatments but ITotso large as to produce significant but trivial results

The possibility of your results being statisticalIy significant and Yet trivial 11THyseem strange co you. Ifso, The next section may clarify this concept

r

I

Using Inferential Statistics

Statistical Versus Practical Significance

To say that results are significant (statisticalIy speaking) merely indicates LITat the observed differences betweensample means are probably reliable, northe result of chanceConfusion arises when You give the word significantitsii\ore coinmoiT nTeaning. Something "significant" in This more common sense is important or worthy of note

The fact That the Treatment means of Your experiment differ significantly ina\or may not be important. If the difference is predicted by a particular theory and ITotby others, then the finding may be important because it supports the theory oveithe others. The finding also may be important if it shows that one variable stronglyaffects another. Such findings may have practical implications by demonstrating, forexainple, the superiority of a new therapeutic technique. In such cases, a statisticalIysignificant (i. e. , reliable) finding also may have pro^Cttlsign"cance

Advertisers sometimes purposely blur the distinctioi\ herweeiT statistical and PIacticalsignificance. A few years ago, Bayer aspirii\ announced The results of a "hospitalstudy on pain other than headache. " Evidently, groups of I\OSpitalpatienrs were Treatedwith Bayer aspirin and with several other brands. The advertisement glossed over Thedetails of the study, bur apparently the patients were asked to rate the severity of theirpain at some point after raking Bayer or Brand X (the identities of both brands wereprobably concealed). According to the ad, "the results were significant-Bayer was better. " However, the ad did riotsay in what way the results were significant. Evidently, theresults were statisticalIy significant and thus probably riot caused by chance. Withoutany information aboutthe pailTratings, however, yoLido I\orkno\v Ifthis finding has anypracticalsignificance. It may be That the Bayer and Brand X group ratings differed byless rhaiT I point on a 10-pointscale. Although this average difference Inay ITave beenreliable, it also may be the case that I\o individual could tellrhe difference between twopains so close together on the scale. 11\ that case, the statisticalIy significant differencewould have no practicalsignificance and would provide no Teasoi\ for. choosing Bayerover other brands of aspirin

^

- , 'tl. ht I , Ithar oLiderermine Is reasonable for yoLn' PUTpos ,, - , 'tl. ht I ,elrhar yoLiderermine Is reasonable for yoLn' PUTp\ ,

, ". t"inureTeliablC"Thai\SignificantTeSLiiSOtrainC\, ,,

werercstingthecffectivenCSSO, , ,,,, jousrhai\a'YP"

Trot. Ifyoti^etain The ITUlll^, POT \esis w Tel\ It Is , .convictedasaTeSLit- . balancebetweenTVPel

, \ H '*Tsunfortunutcly, most journalswilli\ulpLi ish t nt at least at The p < .05 level. Chapter 3 exam'NITificantaTlcaStaTTith<. eve. t ,

SIoi\ of publicatioi\ practices

The Meaning of the Level of Significance

In the behavioral sciences, an alpha level of .05 (or I chance in 20) Is usually considered the maximum acceptable rate for Type I errors. This level provides reasonableprotection against Type I errors while also maintaining a reasonable levelofpowerformost analyses. Of course, ifyou wantto guard more strongly against Type I errors, youcan adopt a more stringent alpha level, such as the .01 level(I chance in 100)

, ALTOPICSIN INFERENTIALSTATISTICECIALTOPICSIN INFERENTIALSTATISTl

Data Transformations'ansfoTm your data with the appropriate

f tion. TransformingdarameansconvertingyoLirorigidatatranSfOrma"""' b CCDjniishedbyaddiUgorSUb,

V f-QineachscorecanmaketheiTurnersinanagam It Subtractingaconsrantfroineachscorecan ,to

'richscoreinighrremovei\egativenumers- .. ncdis,

b doesn't change. ThemeaiTofthedistrititiOD g ' ddeviationdoesI\Or- "V'' lidjj, ,,, t, qnsfoTmqtionS, SimplyLevia s ItTansformations, calledlineartTansfoTmutions, SImpyI"n, etheinagniiudeofrhei\umbersTepresentingy ,thescaleofmeaStiTement. ,, 110ns. MYOUT

h In tions, yoticouldchooseadifferenrstatistic.datadonotmeettheseassumptions, yoticOLl htcanbGdatadonotmeettheseassumptiOnS, VO\I -,, ICthatcanbG

d in transformations and The conditions tin er w IC

*

Page 16: Bordens and Abbott 2008

446 CHAPTER 14

TABLE 14-7 DataTransformations and Uses

TRANSFORMATION

Using Inferential Sintisrics

Square root

Arcsin

x' = Vie

x'=V^I

When cellmeansandvariances are related, thistransformation makesvariances more horno

geneous; also, if data showa moderate positive skewWhen basic observationsX' = 2 arcsin V5<are proportions and have a

X ,: (1/2n)' binomial distribution

Normalizes data with

severe positive skew

FORMULA

Log

or

Formula used if basic observations are frequencies or Ifvalues ofX are smallbFormula used ifvalues ofX are close to O or I

Formula used ifvalue ofX is equal to or near OSOURCE:Information summarized from Tabachnick and Fide11, 2001, and Winer, 1971

or

X' = Z arcsin

X' = log X

X' = log(X + I)or

Data transformations to make data conforin 10 the assumptions of a statistic arebeing LISed less and less frequently (KGppe1, 1973). ANOVA, perhaps the most coin-monly used inferentialstatistic, appears To be very robust against even moderatelyserious violations of Its assLimptions L!riderlying the test. For example, Winer (197 I)has demonstrated that evei\ if the within-cell variances vaTy by a 3:1 ratio, the F testis I\ot seriously biased. Transformations t)f the data Ina^ riot be ITecessary in thesecases. Also, \vhei\ yon trans{trim your data, your conclusions must be based on thetransformed scale and I\or the original. In most cases, this is ITor a problem. However,Keppe1(1973) provides ai\ example in which a square Tool transformation changedsignificantly the relationship betweei\ Two means. Prior to transformation, the meanfor OroLip I was lower than the Inean for Group Z. The opposite was true after trans-formation

Use data transformations only when absolutely I\ecessaTy because they can betrick . Sometimes Transformations of data correct one aspect of the data (such asrestoring normality) but induce new violations of assumptions (such as heterogeneityof variance). If yoti inList LISe a data transformation, before going forward with yoLiranalysis, check to be sure that the transformation had the intended effect.

Alternatives to InferGritial Statistics

Inferentialstaristics are tools to help yon inake a decisioi\ about the null hypothesis.Essentially, inferenrial statistics provide yoLi witl\ a way to test the reliability of a

USE

ECIALTOPICSIN INFERENTIALSTATISTICS

.' thenunh orhesisatp<. 05, it meansthatasingleexperimenr. WhenvOUTeie d, Id, ccuronlvOD"

d obabl notduetochancebutrathertothee ecto I e '

.O a havedarathatbadlyviolatetheassumprionso parameithnoappTopTiatei\DriparametTicstatisticrouseinstea ,

Thereliabilityofyourdatabyreplication- liable,,picationm"' ' I f eachreplicationReplicationdoes

,, iginalexperime''. thnrheoriginalcontext-Theorigll P atameteTswithii\Theoriginalcontext. enew experiment will provi e a c eInformatio" atOSmall, ridesIgnsorsituationsin

' Youcanincludeai\elementof replicationinhichviolationsofassumptjonsoccur. You can incue fia,whichviolationsofaSSUmptiOnSO , Itj, , OUTOWnfi"'

I th result of "noise. " Inferentialstatistics can con TohLiman tendency To interpret every appaTenweremeaningful athereforeinayfail

h clearl shownbyTeplication. ACasempointisprovi yseries of experimentscon ucte y

~.

Page 17: Bordens and Abbott 2008

\

448 CHAPTER 14 , Using InfercnrialStarisrics

e ecr o predictable versus tinpredicrable shock schedules on pain sensitivir . In e hexperiment, Three groups of eightrats were exposed to a schedtile of predictable shock,

or pain sensitivity by means of the "tail-fuck" rest. In the tail-flick test, a hot beam fI was octise on tle rat's tai. The length of rime elapsing Lintil the rat flicked 'I

rail out of the healn (a protective reflex) indicated the degree of am sensitivit .

establishedfindin. I ad" ,h , Picatingawe,sensitive t ai\ the group exposed to predictable shock. However, this effect was tstatisticalIysignificant(p>. 05). '

aramerers of the experiment were twice altered in ways that were ex ected tincrease The size of the predictability effect (if it existed), and the experiment wasrep icate . However, each replication produced virtually the identical result. On ea hoccasion, I e 11npredicrable shock group demonstrated lesssensitivir to am tha Ihpre ICta e shock group, and each time this difference was riotstatisticall SI rim

The problemcouldbedealtwirhbytakingineasures to increase the f h

', o o so won appear 10 e a waste of resources. In this case, the reliabilitof Ihfid I Sources. ntiscase, thereliabilityanalysis itself indicated that the results were probably nor reliable.of therese-I .A ganareiTortegoaja particLi ar way simply becaLise a particular Inferential statistic Is available to analsuc a esign. Much like designing Your experiment before developing h orheses,

variableThewa Idjk P"YOUrinepen, ,,

theI\ select the method of analysis (whether inferential statistic or re 11catiworks best for. that design.

SUMMARY

is c apter has reviewed some of the basics of InferGritial statistics. Inferenrialfullcs go eyond simple description of results. They allow you To determine wh thr e i erences observed in yoLir sample are reliable. Inferentialsratistics allow ou t

e a ecisioit a otit The viability of The null hypothesis (which states That there isnullhoth h g proaiityorejectingthe

ANOVA)Inakeassuintioi\ ab, ,, h "" 'stettesra, dexamp e, t ese tests assume that the sampling distribution of means is normal a dfo h d Tametricstaristicsareesignedvio ate I e assumptions of a parametric test or your data are scaled on a nominal

REVIEW QUESTIONS

ordinal scale, a noriparametric statistic can be used (such as chi-square or the Mann-Whimey U test). These tests are LISually easier to compute than parametric tests.However, they are less powerful and more limited in application. Noriparametric sta-tisrics may nor be available for higher-order factorial designs.

Statistical significance indicates that the difference between your means wasunlikel if only chance were at work. Itsuggests that Your Independent varia e a

f{ I. Two factors contribute to a statisticalIy significant effect: the size of thedifference betweei\ means and The variability among the scores. You can have a argedifference between Ineans, but if the variability is high, You may not find starisricaincance. Conversely, you may have a very small difference and find a significant

effect if The variability Is low.Consider The power of Your statistical test when evaluating your results. I you o

not find statistical significance, perhaps no differences exist. Or it could mean that Yourrest was not sensitive enough to pick up small differences that do exist. Sample size is

'In ortanr contributor to power. Generally, the larger the sample, the Inore POWe uthe statistic. This Is because larger samples are more representative of the un er yingo ularions than are smallsamples. Use a sample that is large enough To e sensitive to

differences but riotso large as to be oversensitive. There are Inethods for determiningtimalsain Ie sizes for a given level of power. However, you must be willing and ab e

to s ecif an expected magnitude of the treatment effect, an estimate o error vari-ance, and The desired power. The first two cal\ be estimated from pilot data or previousresearch. Unfortunately, there is no agreed-o1\ acceptable levelofpower.

A al ha level of .05 is the largest generally acceptable levelfor Type I errors.This value has been chosen because It represents a reasonable compromise etweenT e I and Type H errors. In some cases (such as in applied research), The .05 Ieve

b conservative. However, journalsprobably will norpublishresultsthatfaimay be too conservative. However, journa s pto Teach the conventional level of significance.

Data transformations are available for those situations in which your data are insome wa abnormal. You may transform data ifrhe ITUmbers are large and unmanage-bl or if OUT data do norineet The assumptions of a statistical test. The trans orma-

ton of data to meet assumptions of a rest, however, is being done less requent ybecause inferentialsratistics tend to be robust against the effects of even moderate y

veTe violations of assumptions. Transformations should be used sparingIy becausethey change the I\arure of the variables of yoursrudy

REVIEW QUESTIONS

Why are sampling distributions important in inferGrillalstatistics.Whatissampling error, and why is it importantto know about.Z

What are degrees offreedom, and how do they relate To inferentialstatistics.3

How do parametric and nonparametric statistics differ!4

What is the general logic behind Inferentialstatistics!5

6. How are Type I and Type 11 errors related!7. What does statistical significance mean!8. When should you use a one'tailed or a two-tailed Test.9. What are the assumptions Lindenying parametric statistics.

~

,,..- ~,~~~~~

Page 18: Bordens and Abbott 2008

450

~,

I

CHAPTER 14

*

Whicl\ parametric statistics would yon Lise to analyze data froin ai\ experimentwith two grillips! Identify which statistic would be LISed for a particular type ofdesigi\ or data.

11. Which parametric statistic is 11ToSL appropriate fin' designs with Inure than onelevel of a single independent variable!

12. Whei\ would y. 11 do a planned versus at\ 11nplanned comparison, and why!13. Whatis the difference between weighted and tinweigl\ted Ineans analysis, and

when would yoiiLise each!14. What are a Inaii\ effect and ai\ interaction, and how are they analyzed!15. Under what conditions would yoLiLise a ITonparainetric statistic!16. What is Ineant by the power of a statistical test, and what factors can affect 11117. Does a statisticalIy significant finding always have practicalsignificance! Why

or why not!18. Whei\ are data transformationstised, and whatshould yoti consider when

LISing Line!

19. What are the alternatives to inferenrialsratistics for evaluating the reliabilityof data!

10

Using Infercnrial Statistics

,

,

*" A1' %*' r ':;;'

KEYTERMS

inferential statistics

standard error of the Inean

degrees offreedoiiT (df)Type I erroi

Type H error

alpha level(or)

critical region

t. " ^

I test

tiest for independentsamplest rest for correlated samplesz test for the difference between twoproportions

analysis of variance (ANCVA)

F ratio

p value

planned comparisons

tinplanned comparisons

pepcomparisoi\ erroi

fomilywise error

analysis of covariance (ANCOVA)

chi-square (X )

Mann-Whimey U test

Wilcoxon signed ranks testPOWei

effect size

data Transformation

Using MultivariateDesign and Analysis

uring discussions of experimental and nonexperimental design,previous chapters assumed that only one dependent variable was

included in a design or that multiple dependent variables were treatedseparately in any statistical tests. This approach to analysis is calleda univariate strategy. Although Inariy research questions can Ile addressed with a univariate strategy, others are best addressed by considGring variables together in a single analysis. WheiT you Incltide two ormore dependentineasuTes in asingle analysis, you are using a multivariate strategy

This chapter introduces the Inajor InLiltivariate analysis Techniques. Keep in mind that providing aiT in-depth introduction tothese techniques in the confines of one chapter is impossible. Sucha task is better suited to ai\ entire book. Also, the complex andlaborioLis calculations needed to compute Inulrivariate statistics arebetter left to computers. Consequently, this chapter does not discussthe ITTathematics behind these statistical tests except for. those casesin which some mathematical analysis is reqLiired to tindersrand Theissues. Instead, this chapter focuses o1T practical issues applicationsof the various statistics, the assumptions That must be Inet, and interpretation of results. Ifyou want to use any of the statistics disctissed inthis chapter, ^earl Using Multivariate Statistics (Tabachnick & Fidell2001) or one of the many monographs published by Sage PublicaLions (such as Asher, 1976, or Levine, 1977)

CHAPTE

Correlation al a

Multivariate De

Correlation al

ExperimentalCausal Inferer

Assumptions aMultivariate St

LinearityOutliers

Normality antMulticollinear

Error of Meas

Sample SizeMultivariate St

Factor AnalysPartial and P

Multiple RegrDISCriminant

Canonical Co

Multivariate I

Multiway Fret

Path AnalysisStructural Eq

Multivariate Ar

Note

CORRELATIONALANDEXPERIMENTALMULTIVARIATEDESIGNS

A multivariate design Is a research design In which multiple dependent or Inultiple predictor analor CTiterioiT variables are includedAnalysis of data froin such designs requires special statistical procedures. Multivariate desigi\ and analysis apply to both experimentaland correlational research studies. The following sections describesome of the available multivariate statistical rests

Summary

Review QuestIC

1<ey Terms