Effect Size

Preview:

DESCRIPTION

Effect Size

Citation preview

  • 1

    ItstheEffectSize,Stupid1Whateffectsizeisandwhyitisimportant

    PaperpresentedattheBritishEducationalResearchAssociationannualconference,Exeter,1214September,2002

    RobertCoe

    SchoolofEducation,UniversityofDurham,LeazesRoad, DurhamDH11TATel01913344184Fax01913344180Emailr.j.coe@dur.ac.uk

    AbstractEffectsizeisasimplewayofquantifyingthedifferencebetweentwogroups

    thathasmanyadvantagesovertheuseoftestsofstatisticalsignificancealone.Effectsizeemphasisesthesizeofthedifferenceratherthanconfoundingthiswithsamplesize.However,primaryreportsrarelymentioneffectsizesandfewtextbooks,researchmethodscoursesorcomputerpackagesaddresstheconcept.Thispaperprovidesanexplicationofwhataneffectsizeis,howitiscalculatedandhowitcanbeinterpreted.Therelationshipbetweeneffectsizeandstatisticalsignificanceisdiscussedandtheuseofconfidenceintervalsforthelatteroutlined.Someadvantagesanddangersofusingeffectsizesinmetaanalysisarediscussedandotherproblemswiththeuseofeffectsizesareraised.Anumberofalternativemeasuresofeffectsizearedescribed.Finally,adviceontheuseofeffectsizesissummarised.

    Effectsizeissimplyawayofquantifyingthesizeofthedifferencebetweentwogroups.Itiseasytocalculate,readilyunderstoodandcanbeappliedtoanymeasuredoutcomeinEducationorSocialScience.Itisparticularlyvaluableforquantifyingtheeffectivenessofaparticularintervention,relativetosomecomparison.Itallowsustomovebeyondthesimplistic,Doesitworkornot?tothefarmoresophisticated,Howwelldoesitworkinarangeofcontexts?Moreover,byplacingtheemphasisonthemostimportantaspectofanintervention thesizeoftheeffectratherthanitsstatisticalsignificance(whichconflateseffectsizeandsamplesize),itpromotesamorescientificapproachtotheaccumulationofknowledge.Forthesereasons,effectsizeisanimportanttoolinreportingandinterpretingeffectiveness.

    Theroutineuseofeffectsizes,however,hasgenerallybeenlimitedtometaanalysisforcombiningandcomparingestimatesfromdifferentstudiesandisalltoorareinoriginalreportsofeducationalresearch(Keselman etal.,1998).Thisisdespitethefactthatmeasuresofeffectsizehavebeenavailableforatleast60years(Huberty,2002),andtheAmericanPsychologicalAssociationhasbeenofficiallyencouragingauthorstoreporteffectsizessince1994butwithlimitedsuccess(Wilkinson etal.,1999).Formulaeforthecalculationofeffectsizesdonotappearinmoststatisticstextbooks(otherthanthosedevotedtometaanalysis),arenotfeaturedinmanystatisticscomputerpackagesandareseldomtaughtinstandardresearchmethodscourses.Forthesereasons,eventheresearcherwhoisconvincedbythe

    1Duringthe1992USPresidentialelectioncampaign,BillClintonsfortunesweretransformedwhenhisadvisorshelpedhimtofocusonthemainissuebywritingItstheeconomy,stupidonaboardtheyputinfrontofhimeverytimehewentouttospeak.

  • 2

    wisdomofusingmeasuresofeffectsize,andisnotafraidtoconfronttheorthodoxyofconventionalpractice,mayfindthatitisquitehardtoknowexactlyhowtodoso.

    Thefollowingguideiswrittenfornonstatisticians,thoughinevitablysomeequationsandtechnicallanguagehavebeenused.Itdescribeswhateffectsizeis,whatitmeans,howitcanbeusedandsomepotentialproblemsassociatedwithusingit.

    1.Whydoweneedeffectsize?ConsideranexperimentconductedbyDowson(2000)toinvestigatetimeof

    dayeffectsonlearning:dochildrenlearnbetterinthemorningorafternoon?Agroupof38childrenwereincludedintheexperiment.Halfwererandomlyallocatedtolistentoastoryandanswerquestionsaboutit(ontape)at9am,theotherhalftohearexactlythesamestoryandanswerthesamequestionsat3pm.Theircomprehensionwasmeasuredbythenumberofquestionsansweredcorrectlyoutof20.

    Theaveragescorewas15.2forthemorninggroup,17.9fortheafternoongroup:adifferenceof2.7.Buthowbigadifferenceisthis?Iftheoutcomeweremeasuredonafamiliarscale,suchasGCSEgrades,interpretingthedifferencewouldnotbeaproblem.Iftheaveragedifferencewere,say,halfagrade,mostpeoplewouldhaveafairideaoftheeducationalsignificanceoftheeffectofreadingastoryatdifferenttimesofday.However,inmanyexperimentsthereisnofamiliarscaleavailableonwhichtorecordtheoutcomes.Theexperimenteroftenhastoinventascaleortouse(oradapt)analreadyexistingonebutgenerallynotonewhoseinterpretationwillbefamiliartomostpeople.

    (a) (b)

    Figure1

    Onewaytogetoverthisproblemistousetheamountofvariationinscorestocontextualisethedifference.Iftherewerenooverlapatallandeverysinglepersonintheafternoongrouphaddonebetteronthetestthaneveryoneinthemorninggroup,thenthiswouldseemlikeaverysubstantialdifference.Ontheotherhand,ifthespreadofscoreswerelargeandtheoverlapmuchbiggerthanthedifferencebetweenthegroups,thentheeffectmightseemlesssignificant.Becausewehaveanideaoftheamountofvariationfoundwithinagroup,wecanusethisasayardstickagainstwhichtocomparethedifference.Thisideaisquantifiedinthecalculationoftheeffectsize.TheconceptisillustratedinFigure1,whichshowstwopossiblewaysthedifferencemightvaryinrelationtotheoverlap.Ifthedifferencewereasingraph(a)itwouldbeverysignificantingraph(b),ontheotherhand,thedifferencemighthardlybenoticeable.

  • 3

    2.Howisitcalculated?Theeffectsizeisjustthestandardisedmeandifferencebetweenthetwo

    groups.Inotherwords:

    EffectSize=

    Equation1

    Ifitisnotobviouswhichoftwogroupsistheexperimental(i.e.theonewhichwasgiventhenewtreatmentbeingtested)andwhichthecontrol(theonegiventhestandardtreatment ornotreatment forcomparison),thedifferencecanstillbecalculated.Inthiscase,theeffectsizesimplymeasuresthedifferencebetweenthem,soitisimportantinquotingtheeffectsizetosaywhichwayroundthecalculationwasdone.

    Thestandarddeviationisameasureofthespreadofasetofvalues.Hereitreferstothestandarddeviationofthepopulationfromwhichthedifferenttreatmentgroupsweretaken.Inpractice,however,thisisalmostneverknown,soitmustbeestimatedeitherfromthestandarddeviationofthecontrolgroup,orfromapooledvaluefrombothgroups(seequestion7,below,formorediscussionofthis).

    InDowsonstimeofdayeffectsexperiment,thestandarddeviation(SD)=3.3,sotheeffectsizewas(17.915.2)/3.3=0.8.

    3.Howcaneffectsizesbeinterpreted?Onefeatureofaneffectsizeisthatitcanbedirectlyconvertedintostatements

    abouttheoverlapbetweenthetwosamplesintermsofacomparisonofpercentiles.AneffectsizeisexactlyequivalenttoaZscoreofastandardNormal

    distribution.Forexample,aneffectsizeof0.8meansthatthescoreoftheaveragepersonintheexperimental groupis0.8standarddeviationsabovetheaveragepersoninthecontrolgroup,andhenceexceedsthescoresof79%ofthecontrolgroup.Withthetwogroupsof19inthetimeofdayeffectsexperiment,theaveragepersonintheafternoongroup(i.e.theonewhowouldhavebeenranked10th inthegroup)wouldhavescoredaboutthesameasthe4thhighestpersoninthemorninggroup.Visualisingthesetwoindividualscangivequiteagraphicinterpretationofthedifferencebetweenthetwoeffects.

    TableIshowsconversionsofeffectsizes(column1)topercentiles(column2)andtheequivalentchangeinrankorderforagroupof25(column3).Forexample,foraneffectsizeof0.6,thevalueof73%indicatesthattheaveragepersonintheexperimentalgroupwouldscorehigherthan73%ofacontrolgroupthatwasinitiallyequivalent.Ifthegroupconsistedof25people,thisisthesameassayingthattheaverageperson(i.e.ranked13th inthegroup)wouldnowbeonapar withthepersonranked7th inthecontrolgroup.Noticethataneffectsizeof1.6wouldraisetheaveragepersontobelevelwiththetoprankedindividualinthecontrolgroup,soeffectsizeslargerthanthisareillustratedintermsofthetoppersoninalargergroup.Forexample,aneffectsizeof3.0wouldbringtheaveragepersoninagroupof740levelwiththepreviouslytoppersoninthegroup.

    [Meanofexperimentalgroup] [Meanofcontrolgroup]

    StandardDeviation

  • 4

    TableI: Interpretationsofeffectsizes

    EffectSize

    Percentageofcontrolgroupwhowouldbebelowaveragepersonin

    experimentalgroup

    Rankofpersoninacontrol

    groupof25whowouldbe

    equivalenttotheaveragepersonin

    experimentalgroup

    Probabilitythatyoucouldguesswhichgroupapersonwasin

    fromknowledgeoftheirscore.

    Equivalentcorrelation,r(=Differenceinpercentage

    successfulineachofthetwogroups,BESD)

    Probabilitythatpersonfromexperimentalgroupwillbehigherthanpersonfrom

    control,ifbothchosenatrandom(=CLES)

    0.0 50% 13th 0.50 0.00 0.50

    0.1 54% 12th 0.52 0.05 0.53

    0.2 58% 11th 0.54 0.10 0.56

    0.3 62% 10th 0.56 0.15 0.58

    0.4 66% 9th 0.58 0.20 0.61

    0.5 69% 8th 0.60 0.24 0.64

    0.6 73% 7th 0.62 0.29 0.66

    0.7 76% 6th 0.64 0.33 0.69

    0.8 79% 6th 0.66 0.37 0.71

    0.9 82% 5th 0.67 0.41 0.74

    1.0 84% 4th 0.69 0.45 0.76

    1.2 88% 3rd 0.73 0.51 0.80

    1.4 92% 2nd 0.76 0.57 0.84

    1.6 95% 1st 0.79 0.62 0.87

    1.8 96% 1st 0.82 0.67 0.90

    2.0 98% 1st (or1stoutof

    44)0.84 0.71 0.92

    2.5 99% 1st (or1stoutof

    160)0.89 0.78 0.96

    3.0 99.9% 1st (or1stoutof

    740)0.93 0.83 0.98

    Anotherwaytoconceptualisetheoverlapisintermsoftheprobabilitythatonecouldguesswhichgroupapersoncamefrom,basedonlyontheirtestscore orwhatevervaluewasbeingcompared.Iftheeffectsizewere0(i.e.thetwogroupswerethesame)thentheprobabilityofacorrectguesswouldbeexactlyahalf or0.50.Withadifferencebetweenthetwogroupsequivalenttoaneffectsizeof0.3,thereisstillplentyofoverlap,andtheprobabilityofcorrectly identifyingthegroupsrisesonlyslightlyto0.56.Withaneffectsizeof1,theprobabilityisnow0.69,justoveratwothirdschance.Theseprobabilitiesareshowninthefourthcolumnof TableI.Itisclearthattheoverlapbetweenexperimentalandcontrolgroupsissubstantial(andthereforetheprobabilityisstillcloseto0.5),evenwhentheeffectsizeisquitelarge.

  • 5

    Aslightlydifferentwaytointerpreteffectsizesmakesuseofanequivalencebetweenthestandardisedmeandifference(d)andthecorrelationcoefficient,r.Ifgroupmembershipiscodedwithadummyvariable(e.g.denotingthecontrolgroupby0andtheexperimentalgroupby1)andthecorrelationbetweenthisvariableandtheoutcomemeasurecalculated,avalueof rcanbederived.Bymakingsomeadditionalassumptions,onecanreadilyconvertd intoringeneral,usingtheequationr2= d2 / (4+d2)(seeCohen,1969,pp2022forotherformulaeandconversiontable).RosenthalandRubin(1982)takeadvantageofaninterestingpropertyof rtosuggestafurtherinterpretation,whichtheycallthebinomialeffectsizedisplay(BESD).Iftheoutcomemeasureisreducedtoasimpledichotomy(forexample,whetherascoreisaboveorbelowaparticularvaluesuchasthemedian,whichcouldbethoughtofassuccessorfailure),rcanbeinterpretedasthedifferenceintheproportionsineachcategory.Forexample,aneffectsizeof0.2indicatesadifferenceof0.10intheseproportions,aswouldbethecaseif45%ofthecontrolgroupand55%ofthetreatmentgrouphadreachedsomethresholdofsuccess.Note,however,thatiftheoverallproportionsuccessfulisnotcloseto50%,thisinterpretationcanbesomewhatmisleading(Strahan,1991McGraw,1991).ThevaluesfortheBESDareshownincolumn5.

    Finally,McGrawandWong(1992)havesuggestedaCommonLanguageEffectSize(CLES)statistic,whichtheyargueisreadilyunderstoodbynonstatisticians(shownincolumn6ofTableI).Thisistheprobabilitythatascoresampledatrandomfromonedistributionwillbegreaterthanascoresampledfromanother.Theygivetheexampleoftheheightsofyoungadultmalesandfemales,whichdifferbyaneffectsizeofabout2,andtranslatethisdifferencetoaCLESof0.92.Inotherwordsin92outof100blinddatesamongyoungadults,themalewillbetallerthanthefemale(p361).

    ItshouldbenotedthatthevaluesinTableIdependontheassumptionofaNormaldistribution.Theinterpretationof effectsizesintermsofpercentilesisverysensitivetoviolationsofthisassumption(seequestion7,below).

    Anotherwaytointerpreteffectsizesistocomparethemtotheeffectsizesofdifferencesthatarefamiliar.Forexample,Cohen(1969,p23)describesaneffectsizeof0.2assmallandgivestoillustrateittheexamplethatthedifferencebetweentheheightsof15yearoldand16yearoldgirlsintheUScorrespondstoaneffectofthissize.Aneffectsizeof0.5isdescribedasmedium andislargeenoughtobevisibletothenakedeye.A0.5effectsizecorrespondstothedifferencebetweentheheightsof14yearoldand18yearoldgirls.Cohendescribesaneffectsizeof0.8asgrosslyperceptibleandthereforelargeandequatesittothedifferencebetweentheheightsof13yearoldand18yearoldgirls.AsafurtherexamplehestatesthatthedifferenceinIQbetweenholdersofthePh.D.degreeandtypicalcollegefreshmeniscomparabletoaneffectsizeof0.8.

    Cohendoesacknowledgethedangerofusingtermslikesmall,mediumandlargeoutofcontext.Glassetal.(1981,p104)areparticularlycriticalofthisapproach,arguingthattheeffectivenessofaparticularinterventioncanonlybeinterpretedinrelation tootherinterventionsthatseektoproducethesameeffect.Theyalsopointoutthatthepracticalimportanceofaneffectdependsentirelyonitsrelativecostsandbenefits.Ineducation,ifitcouldbeshownthatmakingasmallandinexpensivechangewouldraiseacademicachievementbyaneffectsizeofevenaslittleas0.1,thenthiscouldbeaverysignificantimprovement,particularlyiftheimprovementapplieduniformlytoallstudents,andevenmoresoiftheeffectwerecumulativeovertime.

  • 6

    TableII: Examplesofaverageeffectsizesfromresearch

    Intervention OutcomeEffectSize

    Source

    Studentstestperformanceinreading 0.30Reducingclasssizefrom23to15 Studentstestperformanceinmaths 0.32

    FinnandAchilles,(1990)

    Attitudesofstudents 0.47Small(

  • 7

    inaspellingagefrom11to12correspondstoaneffectsizeofabout0.3,butseemstovaryaccordingtotheparticulartestused.

    InEngland,thedistributionofGCSEgradesincompulsorysubjects(i.e.MathsandEnglish)havestandarddeviationsofbetween1.5 1.8 grades,soanimprovementofoneGCSEgraderepresentsaneffectsizeof0.50.7.Inthecontextofsecondaryschoolstherefore,introducingachangeinpracticewhoseeffectsizewasknowntobe0.6wouldresultinanimprovementofaboutaGCSEgradeforeachpupilineachsubject.Foraschoolinwhich50%ofpupilswerepreviouslygainingfiveormoreA*Cgrades,thispercentage(otherthingsbeingequal,andassumingthattheeffectappliedequallyacrossthewholecurriculum)wouldriseto73%.1 EvenCohenssmalleffectof0.2wouldproduceanincreasefrom50%to58% adifferencethatmostschoolswouldprobablycategoriseasquitesubstantial.OlejnikandAlgina(2000)giveasimilarexamplebasedontheIowaTestofBasicSkills

    Finally,theinterpretationofeffectsizescanbegreatlyhelpedbyafewexamplesfromexistingresearch.TableIIlistsaselectionofthese,manyofwhicharetakenfromLipseyandWilson(1993).Theexamplescitedaregivenforillustrationoftheuseof effectsizemeasurestheyarenotintendedtobethedefinitivejudgementontherelativeefficacyofdifferentinterventions.Ininterpretingthem,therefore,oneshouldbearinmindthatmostofthemetaanalysesfromwhichtheyarederivedcanbe(andoftenhavebeen)criticisedforavarietyofweaknesses,thattherangeofcircumstancesinwhichtheeffectshavebeenfoundmaybelimited,andthattheeffectsizequotedisanaveragewhichisoftenbasedonquitewidelydifferingvalues.

    ItseemstobeafeatureofeducationalinterventionsthatveryfewofthemhaveeffectsthatwouldbedescribedinCohensclassificationasanythingotherthansmall.Thisappearsparticularlysoforeffectsonstudentachievement.Nodoubtthisispartlyaresultofthewidevariationfoundinthepopulationasawhole,againstwhichthemeasureofeffectsizeiscalculated.Onemightalsospeculatethatachievementishardertoinfluencethanotheroutcomes,perhapsbecausemostschoolsarealreadyusingoptimalstrategies,orbecausedifferentstrategiesarelikelytobeeffectiveindifferentsituationsacomplexitythatisnotwellcapturedbyasingleaverageeffectsize.

    4.Whatistherelationshipbetweeneffectsizeandsignificance?Effectsizequantifiesthesizeofthedifferencebetweentwogroups,andmay

    thereforebesaidtobeatruemeasureofthesignificanceofthedifference.If,forexample,theresultsofDowsonstimeofdayeffectsexperimentwerefoundtoapplygenerally,wemightaskthequestion:Howmuchdifferencewoulditmaketochildrenslearningiftheyweretaughtaparticulartopicintheafternooninsteadofthemorning?Thebestanswerwecouldgivetothiswouldbeintermsoftheeffectsize.

    However,instatisticsthewordsignificanceisoftenusedtomeanstatisticalsignificance,whichisthelikelihoodthatthedifferencebetweenthetwogroupscouldjustbeanaccidentofsampling.Ifyoutaketwosamplesfromthesamepopulationtherewillalwaysbeadifferencebetweenthem.Thestatisticalsignificanceisusuallycalculatedasapvalue,theprobabilitythatadifferenceofatleastthesamesizewouldhavearisenbychance,eveniftherereallywerenodifferencebetweenthetwopopulations.Fordifferencesbetweenthemeansoftwogroups,thispvaluewouldnormallybecalculatedfromattest.Byconvention,ifp

  • 8

    ofthesample.Onewouldgetasignificantresulteitheriftheeffectwereverybig(despitehavingonlyasmallsample)orifthesamplewereverybig(eveniftheactualeffectsizeweretiny).Itisimportanttoknowthestatisticalsignificanceofaresult,sincewithoutitthereisadangerofdrawingfirmconclusionsfromstudieswherethesampleistoosmalltojustifysuchconfidence.However,statisticalsignificancedoesnot tellyouthemostimportantthing:thesizeoftheeffect.Onewaytoovercomethisconfusionistoreporttheeffectsize,togetherwithanestimateofitslikelymarginforerrororconfidenceinterval.

    5.Whatisthemarginforerrorinestimatingeffectsizes?Clearly,ifaneffectsizeiscalculatedfromaverylargesampleitislikelytobe

    moreaccuratethanonecalculatedfromasmallsample.Thismarginforerrorcanbequantifiedusingtheideaofaconfidenceinterval,whichprovidesthesameinformationasisusuallycontainedinasignificancetest:usinga95%confidenceintervalisequivalenttotakinga5%significancelevel.Tocalculatea95%confidenceinterval,youassumethatthevalueyougot(e.g.theeffectsizeestimateof0.8)isthetruevalue,butcalculatetheamountofvariationinthisestimateyouwouldgetifyourepeatedlytooknewsamplesofthesamesize(i.e.differentsamplesof38children).Forevery100ofthesehypotheticalnewsamples,bydefinition,95wouldgiveestimatesoftheeffectsizewithinthe95%confidenceinterval.Ifthisconfidenceintervalincludeszero,thenthatisthesameassayingthattheresultisnotstatisticallysignificant.If,ontheotherhand,zeroisoutsidetherange,thenitisstatisticallysignificantatthe5%level.Usingaconfidenceintervalisabetterwayofconveyingthisinformationsinceitkeepstheemphasisontheeffectsizewhichistheimportantinformation ratherthanthepvalue.

    AformulaforcalculatingtheconfidenceintervalforaneffectsizeisgivenbyHedgesandOlkin(1985,p86).Iftheeffectsizeestimatefromthesampleisd,thenitisNormallydistributed,withstandarddeviation:

    Equation2

    (WhereNEandNCarethenumbersintheexperimentalandcontrolgroups,respectively.)

    Hencea95%confidenceintervalfordwouldbefrom

    d1.96 s[d] to d+1.96 s[d]Equation3

    Tousethefiguresfromthetimeofdayexperimentagain,NE=NC=19andd=0.8,so s[d]= (0.105+0.008)=0.34.Hencethe95%confidenceintervalis[0.14, 1.46].Thiswouldnormallybeinterpreted(despitethefactthatsuchaninterpretationisnotstrictlyjustifiedseeOakes,1986foranenlighteningdiscussionofthis)asmeaningthatthetrueeffectoftimeofdayisverylikelytobebetween

  • 9

    0.14and1.46.Inotherwords,itisalmostcertainlypositive(i.e.afternoonisbetterthanmorning)andthedifferencemaywellbequitelarge.

    6.Howcanknowledgeabouteffectsizesbecombined?Oneofthemainadvantagesofusingeffectsizeisthatwhenaparticular

    experimenthasbeenreplicated,thedifferenteffectsizeestimatesfromeachstudycaneasilybecombinedtogiveanoverallbestestimateofthesizeoftheeffect.Thisprocess ofsynthesisingexperimentalresultsintoasingleeffectsizeestimateisknownasmetaanalysis.Itwasdevelopedinitscurrentformbyaneducationalstatistician,GeneGlass(SeeGlassetal.,1981)thoughtherootsofmetaanalysiscanbetracedagooddealfurtherback(seeLepperetal.,1999),andisnowwidelyused,notonlyineducation,butinmedicineandthroughoutthesocialsciences.AbriefandaccessibleintroductiontotheideaofmetaanalysiscanbefoundinFitzGibbon(1984).

    Metaanalysis,however,candomuchmorethansimplyproduceanoverallaverageeffectsize,importantthoughthisoftenis.If,foraparticularintervention,somestudiesproducedlargeeffects,andsomesmalleffects,itwouldbeoflimitedvaluesimply tocombinethemtogetherandsaythattheaverageeffectwasmedium.Muchmoreusefulwouldbetoexaminetheoriginalstudiesforanydifferencesbetweenthosewithlargeandsmalleffectsandtotrytounderstandwhatfactorsmightaccountforthedifference.Thebestmetaanalysis,therefore,involvesseekingrelationshipsbetweeneffectsizesandcharacteristicsoftheintervention,thecontextandstudydesigninwhichtheywerefound(Rubin,1992seealsoLepperetal. (1999)foradiscussionof theproblemsthatcanbecreatedbyfailingtodothis,andsomeotherlimitationsoftheapplicabilityofmetaanalysis).

    Theimportanceofreplicationingainingevidenceaboutwhatworkscannotbeoverstressed.InDowsonstimeofdayexperimenttheeffectwasfoundtobelargeenoughtobestatisticallyandeducationallysignificant.Becauseweknowthatthepupilswereallocatedrandomlytoeachgroup,wecanbeconfidentthatchanceinitialdifferencesbetweenthetwogroupsareveryunlikelytoaccountforthedifferenceintheoutcomes.Furthermore,theuseofapretestofbothgroupsbeforetheinterventionmakesthisevenlesslikely.However,wecannotruleoutthepossibilitythatthedifferencearosefromsomecharacteristicpeculiartothechildreninthisparticularexperiment.Forexample,ifnoneofthemhadhadanybreakfastthatday,thismightaccountforthepoorperformanceofthemorninggroup.However,theresultwouldthenpresumablynotgeneralisetothewiderpopulationofschoolstudents,mostofwhomwouldhavehadsomebreakfast.Alternatively,theeffectmightdependontheageofthestudents.Dowsonsstudentswereaged7or8itisquitepossiblethattheeffectcouldbediminishedorreversedwitholder(oryounger)students.Thisillustratesthedangerofimplementingpolicyonthebasisofasingleexperiment.Confidenceinthegeneralityofaresultcanonlyfollowwidespreadreplication.

    Animportantconsequenceofthecapacityofmetaanalysistocombineresultsisthatevensmallstudiescanmakeasignificantcontributiontoknowledge.Thekindofexperimentthatcanbedonebyasingleteacherinaschoolmightinvolveatotaloffewerthan30students.Unlesstheeffectishuge,astudyofthissizeismostunlikelytogetastatisticallysignificantresult.Accordingtoconventionalstatisticalwisdom,therefore,theexperimentisnotworthdoing.However,iftheresultsofseveralsuchexperimentsarecombinedusingmetaanalysis,theoverallresultislikelytobehighlystatisticallysignificant.Moreover,itwillhavetheimportantstrengthsofbeingderivedfromarangeofcontexts(thusincreasingconfidenceinitsgenerality)andfromreallifeworkingpractice(therebymakingitmorelikelythatthepolicyisfeasibleandcanbeimplementedauthentically).

  • 10

    Onefinalcaveatshouldbemadehereaboutthedangerofcombiningincommensurableresults.Giventwo(ormore)numbers,onecanalwayscalculateanaverage.However,iftheyareeffectsizesfromexperimentsthatdiffersignificantlyintermsoftheoutcomemeasuresused,thentheresultmaybetotallymeaningless.Itcanbeverytempting,onceeffectsizeshavebeencalculated,totreatthemasallthesameandlosesightoftheirorigins. Certainly,thereareplentyofexamplesofmetaanalysesinwhichthejuxtapositionofeffectsizesissomewhatquestionable.

    Incomparing(orcombining)effectsizes,oneshouldthereforeconsidercarefullywhethertheyrelatetothesameoutcomes.Thisadviceappliesnotonlytometaanalysis,buttoanyothercomparisonofeffectsizes.Moreover,becauseofthesensitivityofeffectsizeestimatestoreliabilityandrangerestriction(seebelow),oneshouldalsoconsiderwhetherthoseoutcomemeasuresarederivedfromthesame(orsufficientlysimilar)instrumentsandthesame(orsufficientlysimilar)populations.

    Itisalsoimportanttocompareonlylikewithlikeintermsofthetreatmentsusedtocreatethedifferencesbeingmeasured.Intheeducationliterature,thesamenameisoftengiventointerventionsthatareactuallyverydifferent,forexample,iftheyareoperationaliseddifferently,oriftheyaresimplynotwellenoughdefinedforittobeclearwhethertheyarethesameornot.Itcouldalsobethatdifferentstudieshaveusedthesamewelldefinedandoperationalisedtreatments,buttheactualimplementationdiffered,orthatthesametreatmentmayhavehaddifferentlevelsofintensityindifferentstudies.Inanyofthesecases,itmakesnosensetoaverageouttheireffects.

    7.Whatotherfactorscaninfluenceeffectsize?Althougheffectsizeisasimpleandreadilyinterpretedmeasureof

    effectiveness,itcanalsobesensitivetoanumberofspuriousinfluences,sosomecareneedstobetakeninitsuse.Someoftheseissuesareoutlinedhere.

    Whichstandarddeviation?Thefirstproblemistheissueofwhichstandarddeviationtouse.Ideally,the

    controlgroupwillprovidethebestestimateofstandarddeviation,sinceitconsistsofarepresentativegroupofthepopulationwhohavenotbeenaffectedbytheexperimentalintervention.However,unlessthecontrolgroupisverylarge,theestimateofthetruepopulationstandarddeviationderivedfromonlythecontrolgroup islikelytobeappreciablylessaccuratethananestimatederivedfromboththecontrolandexperimentalgroups.Moreover,instudieswherethereisnotatruecontrolgroup(forexamplethetimeofdayeffectsexperiment)thenitmaybeanarbitrarydecisionwhichgroupsstandarddeviationtouse,anditwilloftenmakeanappreciabledifferencetotheestimateofeffectsize.

    Forthesereasons,itisoftenbettertouseapooledestimateofstandarddeviation.Thepooledestimateisessentially anaverageofthestandarddeviationsoftheexperimentalandcontrolgroups(Equation 4).Notethatthisisnotthesameasthestandarddeviationofallthevaluesinbothgroupspooledtogether.If,forexampleeachgrouphadalowstandarddeviationbutthetwomeansweresubstantiallydifferent,thetruepooledestimate(ascalculatedby Equation 4)wouldbemuchlowerthanthevalueobtainedbypoolingallthevaluestogetherandcalculatingthestandarddeviation.TheimplicationsofchoicesaboutwhichstandarddeviationtousearediscussedbyOlejnikandAlgina(2000).

  • 11

    Equation4

    (WhereNEandNCarethenumbersintheexperimentalandcontrolgroups,respectively,andSDEandSDCaretheirstandarddeviations.)

    Theuseofapooledestimateofstandarddeviationdependsontheassumptionthatthetwocalculatedstandarddeviationsareestimatesof thesamepopulationvalue.Inotherwords,thattheexperimentalandcontrolgroupstandarddeviationsdifferonlyasaresultofsamplingvariation.Wherethisassumptioncannotbemade(eitherbecausethereissomereasontobelievethatthetwostandarddeviationsarelikelytobesystematicallydifferent,oriftheactualmeasuredvaluesareverydifferent),thenapooledestimateshouldnotbeused.

    IntheexampleofDowsonstimeofdayexperiment,thestandarddeviationsforthemorningandafternoongroupswere4.12and2.10respectively.WithNE=NC=19,Equation2thereforegivesSDpooledas3.3,whichwasthevalueusedin Equation1 togiveaneffectsizeof0.8.However,thedifferencebetweenthetwostandarddeviationsseemsquitelargeinthiscase.Giventhattheafternoongroupmeanwas17.9outof20,itseemslikelythatitsstandarddeviationmayhavebeenreducedbyaceilingeffect i.e.thespreadofscoreswaslimitedbythemaximumavailablemarkof20.Inthiscasetherefore,itmightbemoreappropriatetousethemorninggroupsstandarddeviationasthebestestimate.Doingthiswillreducetheeffectsizeto0.7,anditthenbecomesasomewhatarbitrarydecisionwhichvalueoftheeffectsizetouse.Ageneralruleofthumbinstatisticswhentwovalidmethodsgivedifferentanswersis:Ifindoubt,citeboth.

    CorrectionsforbiasAlthoughusingthepooledstandarddeviationtocalculatetheeffectsize

    generallygivesabetterestimatethanthecontrolgroupSD,itisstillunfortunatelyslightlybiasedandingeneral givesavalueslightlylargerthanthetruepopulationvalue(HedgesandOlkin,1985).HedgesandOlkin(1985,p80)giveaformulawhichprovidesanapproximatecorrectiontothisbias.

    InDowsonsexperimentwith38values,thecorrectionfactorwillbe0.98,soitmakesverylittledifference,reducingtheeffectsizeestimatefrom0.82to0.80.Giventhelikelyaccuracyofthefiguresonwhichthisisbased,itisprobablyonlyworthquotingonedecimalplace,sothefigureof0.8stands.Infact,thecorrectiononlybecomessignificantforsmallsamples,inwhichtheaccuracyisanywaymuchless.Itisthereforehardlyworthworryingaboutitinprimaryreportsofempiricalresults.However,inmetaanalysis,whereresultsfromprimarystudiesarecombined,thecorrectionisimportant,sincewithoutitthisbiaswouldbeaccumulated.

    RestrictedrangeSupposethetimeofdayeffectsexperimentweretoberepeated,oncewiththe

    topsetinahighlyselectiveschoolandagainwithamixedabilitygroupin acomprehensive.Ifstudentswereallocatedtomorningandafternoongroupsatrandom,therespectivedifferencesbetweenthemmightbethesameineachcasebothmeansintheselectiveschoolmightbehigher,butthedifferencebetweenthetwogroupscouldbethesameasthedifferenceinthecomprehensive.However,itisunlikelythatthestandarddeviationswouldbethesame.Thespreadofscoresfound

  • 12

    withinthehighlyselectedgroupwouldbemuchlessthanthatinatruecrosssectionofthepopulation,asforexampleinthemixedabilitycomprehensiveclass.This,ofcourse,wouldhaveasubstantialimpactonthecalculationoftheeffectsize.Withthehighlyrestrictedrangefoundintheselectiveschool,theeffectsizewouldbemuchlargerthanthatfoundinthecomprehensive.

    Ideally,incalculatingeffectsizeoneshouldusethestandarddeviationofthefullpopulation,inordertomakecomparisonsfair.However,therewillbemanycasesinwhichunrestrictedvaluesarenotavailable,eitherinpracticeorinprinciple.Forexample,inconsideringtheeffectofaninterventionwithuniversitystudents,orwithpupilswithreadingdifficulties,onemustrememberthatthesearerestrictedpopulations.Inreportingtheeffectsize,oneshoulddrawattentiontothisfactiftheamountofrestrictioncanbequantifieditmaybepossibletomakeallowanceforit.Anycomparisonwitheffectsizescalculatedfromafullrangepopulationmustbemadewithgreatcaution,ifatall.

    NonNormaldistributionsTheinterpretationsofeffectsizesgiveninTableIdependontheassumption

    thatbothcontrolandexperimentalgroupshaveaNormaldistribution,i.e.thefamiliarbellshapedcurve,shown,forexample,inFigure1.Needlesstosay,ifthisassumptionisnottruethentheinterpretationmaybealtered,andinparticular,itmaybedifficulttomakeafaircomparisonbetweenaneffectsizebasedonNormaldistributionsandonebasedonnonNormaldistributions.

    4 3 2 1 0 1 2 3 4

    StandardNormalDistribution

    (S.D.=1)

    Similarlookingdistributionwithfatterextremes

    (S.D.=3.3)

    Figure2: ComparisonofNormalandnonNormaldistributions

    AnillustrationofthisisgiveninFigure2,whichshowsthefrequencycurvesfortwodistributions,oneofthemNormal,theotheracontaminatednormaldistribution(Wilcox,1998),whichissimilarinshape,butwithsomewhatfatterextremes.Infact,thelatterdoeslookjustalittlemorespreadoutthantheNormaldistribution,butitsstandarddeviationisactuallyoverthreetimesasbig.TheconsequenceofthisintermsofeffectsizedifferencesisshowninFigure3.Bothgraphsshowdistributionsthatdifferbyaneffectsizeequalto1,buttheappearanceoftheeffectsizedifferencefromthegraphsisratherdissimilar.Ingraph(b),the

  • 13

    separationbetweenexperimentalandcontrolgroupsseemsmuchlarger,yettheeffectsizeisactuallythesameasfortheNormaldistributionsplottedingraph(a).Intermsoftheamountofoverlap,ingraph(b)97%ofthe'experimental'groupareabovethecontrolgroupmean,comparedwiththevalueof84%fortheNormaldistributionofgraph(a)(asgiveninTableI).ThisisquiteasubstantialdifferenceandillustratesthedangerofusingthevaluesinTableIwhenthedistributionisnotknowntobeNormal.

    3 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4 5 6

    (a) (b)

    Figure3: NormalandnonNormaldistributionswitheffectsize=1

    MeasurementreliabilityAthirdfactorthatcanspuriouslyaffectaneffectsizeisthereliabilityofthe

    measurementonwhichitisbased.Accordingtoclassicalmeasurementtheory,anymeasureofaparticularoutcomemaybeconsideredtoconsistofthetrueunderlyingvalue,togetherwithacomponentoferror.Theproblemisthattheamountofvariationinmeasuredscoresforaparticularsample(i.e.itsstandarddeviation)willdependonboththevariationinunderlyingscoresandtheamountoferrorintheirmeasurement.

    Togiveanexample,imaginethetimeofdayexperimentwereconductedtwicewithtwo(hypothetically)identicalsamplesofstudents.Inthefirstversionthetestusedtoassesstheircomprehensionconsistedofjust10itemsandtheirscoreswereconvertedintoapercentage.Inthesecondversionatestwith50itemswasused,andagainconvertedtoapercentage.Thetwotestswereofequaldifficultyandtheactualeffectofthedifferenceintimeofdaywasthesameineachcase,sotherespectivemeanpercentagesofthemorningandafternoongroupswerethesameforbothversions.However,itisalmostalwaysthecasethatalongertestwillbemorereliable,andhencethestandarddeviationofthepercentagesonthe50itemtestwillbelowerthanthestandarddeviationforthe10itemtest.Thus,althoughthetrueeffectwasthesame,thecalculatedeffectsizeswillbedifferent.

    Ininterpretinganeffectsize,itisthereforeimportanttoknowthereliabilityofthemeasurementfromwhichitwascalculated.Thisisonereasonwhythereliabilityofanyoutcomemeasureusedshouldbereported.Itistheoreticallypossibletomakeacorrectionforunreliability(sometimescalledattenuation),whichgivesanestimateofwhattheeffectsizewouldhavebeen,hadthereliabilityofthetestbeenperfect.However,inpracticetheeffectofthisisratheralarming,sincetheworsethetestwas,themoreyouincreasetheestimateoftheeffectsize.Moreover,estimatesofreliabilityaredependentontheparticularpopulationinwhichthetestwasused,andarethemselvesanywaysubjecttosamplingerror.Forfurtherdiscussionoftheimpactofreliabilityoneffectsizes,seeBaugh(2002).

  • 14

    8.Aretherealternativemeasuresofeffectsize?Anumberofstatisticsaresometimesproposedasalternativemeasuresof

    effectsize,otherthanthestandardisedmeandifference.Someofthesewillbeconsideredhere.

    ProportionofvarianceaccountedforIfthecorrelationbetweentwovariablesisr,thesquareofthisvalue(often

    denotedwithacapitalletter:R2)representstheproportionofthevarianceineachthatisaccountedforbytheother.In otherwords,thisistheproportionbywhichthevarianceoftheoutcomemeasureisreducedwhenitisreplacedbythevarianceoftheresidualsfromaregressionequation.Thisideacanbeextendedtomultipleregression(whereitrepresentstheproportionofthevarianceaccountedforbyalltheindependentvariablestogether)andhascloseanalogiesinANOVA(whereitisusuallycalledetasquared, h2).Thecalculationof r(andhenceR2)forthekindofexperimentalsituationwehavebeenconsideringhasalreadybeenreferredtoabove.

    BecauseR2hasthisreadyconvertibility,it(oralternativemeasuresofvarianceaccountedfor)issometimesadvocatedasauniversalmeasureofeffectsize(e.g.Thompson,1999).Onedisadvantageofsuchanapproachisthateffectsizemeasuresbasedonvarianceaccountedforsufferfromanumberoftechnicallimitations,suchassensitivitytoviolationofassumptions(heterogeneityofvariance,balanceddesigns)andtheirstandarderrorscanbelarge(OlejnikandAlgina,2000).Theyarealsogenerallymorestatisticallycomplexandhenceperhapslesseasilyunderstood.Further,theyarenondirectionaltwostudieswithpreciselyoppositeresultswouldreportexactlythesamevarianceaccountedfor.However,thereisamorefundamentalobjectiontotheuseofwhatisessentiallyameasureofassociationtoindicatethestrengthofaneffect.

    Expressingdifferentmeasuresintermsofthesamestatisticcanhideimportantdifferencesbetweentheminfact,thesedifferenteffectsizesarefundamentallydifferent,andshouldnotbeconfused.Thecrucialdifferencebetweenaneffectsizecalculatedfromanexperimentandonecalculatedfromacorrelationisinthecausalnatureoftheclaimthatisbeingmadeforit.Moreover,thewordeffecthasaninherentimplicationofcausality:talkingabouttheeffectofAonBdoessuggestacausalrelationshipratherthanjustanassociation.Unfortunately,however,thewordeffectisoftenusedwhennoexplicitcausalclaimisbeingmade,butitsimplicationissometimesallowedtofloatinandoutofthemeaning,takingadvantageoftheambiguitytosuggestasubliminalcausallinkwherenoneisreallyjustified.

    Thiskindofconfusionissowidespreadineducationthatitisrecommendedherethatthewordeffect(andthereforeeffectsize)shouldnotbeusedunlessadeliberateandexplicitcausalclaimisbeingmade.Whennosuchclaimisbeingmade,wemaytalkaboutthevarianceaccountedfor(R2)orthestrengthofassociation(r),orsimply andperhapsmostinformatively justcitetheregressioncoefficient(Tukey,1969).Ifacausalclaimisbeingmadeitshouldbeexplicitandjustificationprovided.FitzGibbon(2002)hasrecommendedanalternativeapproachtothisproblem.Shehassuggestedasystemofnomenclaturefordifferentkindsofeffectsizesthatclearlydistinguishesbetweeneffectsizesderivedfrom,forexample,randomisedcontrolled,quasiexperimentalandcorrelationalstudies.

    OthermeasuresofeffectsizeIthasbeenshownthattheinterpretationofthestandardisedmeandifference

    measureofeffectsizeisverysensitivetoviolationsoftheassumptionofnormality.Forthisreason,anumberofmorerobust(nonparametric)alternativeshavebeensuggested.AnexampleoftheseisgivenbyCliff(1993).Therearealsoeffectsize

  • 15

    measuresformultivariateoutcomes.AdetailedexplanationcanbefoundinOlejnikandAlgina(2000).Finally,amethodforcalculatingeffectsizeswithinmultilevelmodelshasbeenproposedbyTymmsetal.(1997).GoodsummariesofmanyofthedifferentkindsofeffectsizemeasuresthatcanbeusedandtherelationshipsamongthemcanbefoundinSnyderandLawson(1993),Rosenthal(1994)andKirk(1996).

    Finally,acommoneffectsizemeasurewidelyusedinmedicineistheoddsratio.Thisisappropriatewhereanoutcomeisdichotomous:successorfailure,apatientsurvivesordoesnot.Explanationsoftheoddsratiocanbefoundinanumberof medicalstatisticstexts,includingAltman(1991),andinFleiss(1994).

    ConclusionsAdviceontheuseofeffectsizescanbesummarisedasfollows:

    Effectsizeisastandardised,scalefreemeasureoftherelativesizeoftheeffectofanintervention. Itisparticularlyusefulforquantifyingeffectsmeasuredonunfamiliarorarbitraryscalesandforcomparingtherelativesizesofeffectsfromdifferentstudies.

    InterpretationofeffectsizegenerallydependsontheassumptionsthatcontrolandexperimentalgroupvaluesareNormallydistributedandhavethesamestandarddeviations.Effectsizescanbeinterpretedintermsofthepercentilesorranksatwhichtwodistributionsoverlap,intermsofthelikelihoodofidentifyingthesourceofavalue,orwithreferencetoknowneffectsoroutcomes.

    Useofaneffectsizewithaconfidenceintervalconveysthesameinformationasatestofstatisticalsignificance,butwiththeemphasisonthesignificanceoftheeffect,ratherthanthesamplesize.

    Effectsizes(withconfidenceintervals)shouldbecalculatedandreportedinprimarystudiesaswellasinmetaanalyses.

    InterpretationofstandardisedeffectsizescanbeproblematicwhenasamplehasrestrictedrangeordoesnotcomefromaNormaldistribution,orifthemeasurementfromwhichitwasderivedhasunknownreliability.

    Theuseofanunstandardisedmeandifference(i.e.therawdifferencebetweenthetwogroups,togetherwithaconfidenceinterval)maybepreferablewhen:

    theoutcomeismeasuredonafamiliarscale thesamplehasarestrictedrange theparentpopulationissignificantlynonNormalcontrolandexperimentalgroupshaveappreciablydifferentstandard

    deviations theoutcomemeasurehasveryloworunknownreliability

    Caremustbetakenincomparingoraggregatingeffectsizesbasedondifferentoutcomes,differentoperationalisationsofthesameoutcome,differenttreatments,orlevelsofthesametreatment,ormeasuresderivedfromdifferentpopulations.

    Thewordeffectconveysanimplicationofcausality,andtheexpressioneffectsizeshouldthereforenotbeusedunlessthisimplicationisintendedandcanbejustified.

    1Thiscalculationisderivedfromaprobittransformation(Glassetal.,1981,p136),basedontheassumptionofanunderlyingnormallydistributedvariablemeasuringacademicattainment,somethresholdofwhichisequivalenttoastudentachieving5+A* Cs.Percentagesforthechangefromastartingvalueof50%forothereffectsizevaluescanbereaddirectlyfromTable I.Alternatively,if F(z)isthestandardnormalcumulativedistributionfunction, p1istheproportionachievingagiventhresholdand p2theproportiontobeexpectedafterachangewitheffectsize, d,then,

    p2 = F{F1(p1)+ d}

  • 16

    References

    ALTMAN,D.G.(1991) PracticalStatisticsforMedicalResearch.London:ChapmanandHall.

    BANGERT, R.L., KULIK, J.A.ANDKULIK, C.C.(1983)Individualisedsystemsofinstructioninsecondaryschools. ReviewofEducationalResearch,53,143158.

    BANGERTDROWNS, R.L. (1988)Theeffectsofschoolbasedsubstanceabuseeducation:ametaanalysis. JournalofDrugEducation,18,3,24365.

    BAUGH, F.(2002)Correctingeffectsizesforscorereliability:Areminderthatmeasurementandsubstantiveissuesarelinkedinextricably.EducationalandPsychologicalMeasurement,62,2,254263.

    CLIFF, N.(1993)DominanceStatisticsordinalanalysestoanswerordinalquestions PsychologicalBulletin,114,3.494509.

    COHEN, J.(1969)StatisticalPowerAnalysisfortheBehavioralSciences.NY:AcademicPress.

    COHEN, J.(1994)TheEarth isRound(p

  • 17

    HEDGES, L.ANDOLKIN, I.(1985) StatisticalMethodsforMetaAnalysis.NewYork:AcademicPress.

    HEMBREE, R. (1988)Correlates,causeseffectsandtreatmentoftestanxiety. ReviewofEducationalResearch,58(1),4777.

    HUBERTY, C.J..(2002)Ahistoryofeffectsizeindices.EducationalandPsychologicalMeasurement,62,2,227240.

    HYMAN, R.B, FELDMAN, H.R., HARRIS, R.B., LEVIN, R.F.ANDMALLOY, G.B.(1989)Theeffectsofrelaxationtrainingonmedicalsymptoms:ameatanalysis.NursingResearch,38,216220.

    KAVALE, K.A.ANDFORNESS, S.R.(1983)Hyperactivityanddiettreatment:ameatanalysisoftheFeingoldhypothesis. JournalofLearningDisabilities,16,324330.

    KESELMAN, H.J., HUBERTY, C.J., LIX, L.M., OLEJNIK, S. CRIBBIE, R.A., DONAHUE, B.,KOWALCHUK, R.K., LOWMAN, L.L., PETOSKEY, M.D., KESELMAN, J.C.ANDLEVIN, J.R. (1998)Statisticalpracticesofeducationalresearchers:AnanalysisoftheirANOVA,MANOVA,andANCOVAanalyses.ReviewofEducationalResearch,68,3,350386.

    KIRK, R.E.(1996)PracticalSignificance:Aconceptwhosetimehascome.Educationaland PsychologicalMeasurement,56,5,746759.

    KULIK, J.A., KULIK, C.C.ANDBANGERT, R.L.(1984)Effectsofpracticeonaptitudeandachievementtestscores. AmericanEducationResearchJournal,21,435447.

    LEPPER, M.R., HENDERLONG, J., ANDGINGRAS, I. (1999)Understandingtheeffectsofextrinsicrewardsonintrinsicmotivation Usesandabusesofmetaanalysis:CommentonDeci,Koestner,andRyan. PsychologicalBulletin,125,6,669676.

    LIPSEY, M.W.(1992)Juveniledelinquencytreatment:ametaanalyticinquiryintothevariabilityofeffects.InT.D.Cook,H.Cooper,D.S.Cordray,H.Hartmann,L.V.Hedges,R.J.Light,T.A.LouisandF.Mosteller(Eds) Metaanalysisforexplanation.NewYork:RussellSageFoundation.

    LIPSEY, M.W.ANDWILSON, D.B. (1993)TheEfficacyofPsychological,Educational,andBehavioralTreatment:Confirmationfrommetaanalysis. AmericanPsychologist,48,12,11811209.

    MCGRAW, K.O. (1991)ProblemswiththeBESD:acommentonRosenthalsHowAreWeDoinginSoftPsychology.AmericanPsychologist,46,10846.

    MCGRAW, K.O.ANDWONG, S.P. (1992)ACommonLanguageEffectSizeStatistic.PsychologicalBulletin,111,361365.

    MOSTELLER, F., LIGHT, R.J.ANDSACHS, J.A.(1996)'Sustainedinquiryineducation:lessonsfromskillgroupingandclasssize.' HarvardEducationalReview,66,797842.

    OAKES, M.(1986) StatisticalInference:ACommentaryfortheSocialandBehavioralSciences.NewYork:Wiley.

    OLEJNIK, S.ANDALGINA, J.(2000)MeasuresofEffectSizeforComparativeStudies:Applications,InterpretationsandLimitations. ContemporaryEducationalPsychology,25,241286.

  • 18

    ROSENTHAL, R.(1994)ParametricMeasuresofEffectSizeinH.CooperandL.V.Hedges(Eds.), TheHandbookofResearchSynthesis.NewYork:RussellSageFoundation.

    ROSENTHAL, R,ANDRUBIN, D.B. (1982)Asimple,generalpurposedisplayofmagnitudeofexperimentaleffect. JournalofEducationalPsychology,74,166169.

    RUBIN, D.B.(1992)Metaanalysis:literaturesynthesisoreffectsizesurfaceestimation. JournalofEducationalStatistics,17,4,363374.

    SHYMANSKY, J.A., HEDGES, L.V.ANDWOODWORTH, G.(1990)Areassessmentoftheeffectsofinquirybasedsciencecurriculaofthe60sonstudentperformance.JournalofResearchinScienceTeaching,27,127144.

    SLAVIN, R.E.ANDMADDEN, N.A. (1989)Whatworksforstudentsatrisk?Aresearchsynthesis. EducationalLeadership,46(4),413.

    SMITH, M.L.ANDGLASS, G.V. (1980)Metaanalysisofresearchonclasssizeanditsrelationshiptoattitudesandinstruction. AmericanEducationalResearchJournal,17,419433.

    SNYDER, P. ANDLAWSON, S. (1993)EvaluatingResultsUsingCorrectedandUncorrectedEffectSizeEstimates.JournalofExperimentalEducation,61,4,334349.

    STRAHAN, R.F. (1991)RemarksontheBinomialEffectSizeDisplay.AmericanPsychologist,46,10834.

    THOMPSON, B. (1999)Commonmethodologymistakesineducationalresearch,revisited,alongwithaprimeronbotheffectsizesandthebootstrap.InvitedaddresspresentedattheannualmeetingoftheAmericanEducationalResearchAssociation,Montreal.[Accessedfrom,January2000]

    TYMMS, P., MERRELL, C.ANDHENDERSON, B. (1997)TheFirstYearasSchool:AQuantitativeInvestigationoftheAttainmentandProgressofPupils.EducationalResearchandEvaluation,3,2,101118.

    VINCENT, D.ANDCRUMPLER, M. (1997) BritishSpellingTestSeriesManual3X/Y.Windsor:NFERNelson.

    WANG, M.C.ANDBAKER, E.T.(1986)Mainstreamingprograms:Designfeaturesandeffects. JournalofSpecialEducation,19,503523.

    WILCOX, R.R. (1998)Howmanydiscoverieshavebeenlostbyignoringmodernstatisticalmethods?.AmericanPsychologist,53,3,300314.

    WILKINSON, L.AND TASK FORCEON STATISTICAL INFERENCE, APABOARDOFSCIENTIFICAFFAIRS(1999)StatisticalMethodsinPsychologyJournals:GuidelinesandExplanations.AmericanPsychologist,54,8,594604.

Recommended