SD-SE

Embed Size (px)

Citation preview

  • 8/12/2019 SD-SE

    1/5

    StandardDeviation,StandardError

    Which'Standard'ShouldWe Use?

    GeorgeW.Brown,MD

    \s=b\Standarddeviation(SD) andstandarderror(SE)arequietly butextensively usedn biomedical publications.These termsandnotationsareusedasdescriptive sta-tistics(summarizingnumericaldata),andheyareusedas inferential statistics(esti-matingpopulationparameters fromsam-ples). I review theuse and misuse of SDand SE in several authoritative medicalournals and make suggestions to helpclarifythe usageandmeaning of SDandSEin biomedicalreports.

    (AmJ Dis Child1982;136:937-941)

    Standard deviation (SD) and stan

    dard error (SE)havesurfacesimi

    arities; yet, they are conceptually sodifferent thatwe mustwonder why theyareusedalmostinterchangeablyinthemedical literature. Both are usually

    precededby a

    plus-minussymbol(),suggesting that they define a symmetric interval or rangeofsome sort.

    Theybothappearalmostalwayswith amean (average) of a set of measurements orcountsofsomething.Themedcal literatureis replete with statementsike, "The serum cholesterol measure

    ments weredistributedwitha meanof18030mg/dL(SD)."

    Inthe samejournal, perhapsinthesame article, adifferentstatementmayappear: "The weight gains ofthe sub

    ects averaged 720 (mean)32 g/mo(SE)."Sometimes,asdiscussed further,thesummarydata are presented as the"mean of 120 mg/dL12" without the"12" beingdefined as SD or SE, or assomeotherindexof dispersion. Eisenhart1 warned against this "peril of

    From the Los Lunas Hospital and TrainingSchool,New Mexico, and theDepartmentofPedi-atrics, Universityof NewMexico Schoolof Medi-cine, Albuquerque.

    Reprint requests to Los Lunas Hospital and

    Training School, Box 1269, LosLunas, NM 87031(DrBrown).

    shorthand expression" in 1968; Fein-stein2 later again warned about thefatuity and confusion contained in anya b statementswherebisnotdefined.

    Warnings notwithstanding, a glancethroughalmostanymedicaljournalwillshowexamplesofthisusage.

    Medical journals seldom state whySD orSEis selectedtosummarizedataina givenreport. Asearchof thethreemajor pediatriejournalsfor1981(AmericanJournalofDiseasesofChildren,Journal ofPediatrics, andPediatrics)failed toturn upa singlearticle inwhichtheselection of SDorSEwas explained.Thereseems tobeno uniformityintheuseofSD orSE in thesejournals orinTheJournaloftheAmericanMedicalAssociation(JAMA),the New EnglandJournalofMedicine, or Science. TheuseofSDandSEinthejournalswillbediscussedfurther.Iftheserespected, well-editedjour

    nals do not demand consistent use ofeither SD or SE, aretherereallyanyimportant differences between them?Yes, they are remarkably different,despite their superficial similarities.Theyare sodifferentin factthatsomeauthoritieshaverecommendedthatSEshouldrarely or neverbeused to summarize medical research data. Fein-

    stein2notedthefollowing:A standard error has nothing to do withstandards, with errors, orwiththe communication of scientificdata. Theconceptisanabstract idea, spawned by the imaginaryworld ofstatistical inference andpertinentonlywhencertainoperationsofthatimaginaryworldare metinscientificreality.2(p336)Glantz3alsohasmadethefollowing recommendation:

    Mostmedical investigatorssummarize theirdata with the standard error becauseit is

    alwayssmaller thanthe standard deviation.It makes their data look better

    . . .

    data

    should neverbe summarized withthe standard errorofthemean.3*"25

    Acloserlookatthe sourceandmean

    ing of SD and SE may clarify whymedicalinvestigators,journalreviewers,andeditorsshouldscrutinizetheir

    usage withconsiderablecare.

    DISPERSIONAn essentialfunction of"descriptive

    statistics" is the presentation of condensed, shorthand symbolsthatepitomizetheimportantfeaturesof acollectionofdata.The ideaof acentralvalueisintuitively satisfactorytoanyonewhoneedstosummarizea groupofmeasurements, orcounts.Thetraditionalindicatorsof acentraltendency arethemode(the most frequent value), the median(thevalue midwaybetweenthe lowest

    and the highest value), and the mean(theaverage).Eachhasitsspecialuses,butthe meanhas great convenienceandflexibilityformanypurposes.

    Thedispersion of a collection of valuescanbeshowninseveral ways;someare

    simpleandconcise,andothersare complexandesoteric.Therangeis a simple,directway to indicate the spread of acollection ofvalues,butitdoesnottellhowthevalues aredistributed. Knowledgeofthe meanadds considerably totheinformationcarried

    bythe range.

    Another index of dispersion is providedby the differences(deviations)ofeachvaluefromthemeanof thevalues.Thetrouble withthis approachis thatsome deviations will be positive, andsome will be negative, and their sumwillbezero.Wecouldignorethesignofeach deviation, ie, use the "absolutemean deviation," but mathematicianstellusthatworking withabsolutenumbers isextremely difficultandfraughtwithtechnicaldisadvantages.

    Aneglectedmethodforsummarizingthedispersionofdataisthecalculationofpercentiles(ordeciles, or quartiles).Percentiles areused more frequentlyinpediatrics than in other branches ofmedicine,usuallyingrowthcharts orinotherdata arraysthat are clearly notsymmetric orbell shaped. Inthe generalmedicalliterature, percentiles aresparsely used, apparentlybecauseof acommon, but erroneous, assumptionthat themean SDorSEissatisfactoryfor summarizingcentral tendency anddispersion ofall sortsofdata.

  • 8/12/2019 SD-SE

    2/5

    STANDARD DEVIATION

    Thegenerallyacceptedanswer to theneed for a concise expression for thedispersion of data istosquarethedifference of each valuefromthegroupmean,givingall positive values. Whenthesesquared deviations are added up and

    thendividedbythe numberof valuesinthegroup,theresultisthevariance.The variance is alwaysapositive num

    ber,butitisindifferentunitsthanthemean. The way around this inconvenienceis touse thesquare rootofthe

    variance,whichis the populationstandard deviation ( ), which for convenience will becalledSD.Thus,theSD isthesquarerootof the averagedsquareddeviationsfrom the mean. The SD issometimes called by the shorthandterm, "root-mean-square."

    TheSD,calculatedin thisway, isinthesame units astheoriginal valuesandthemean.TheSD hasadditionalpropertiesthatmakeitattractive for sum

    marizingdispersion, especially ifthedata are distributed symmetricallyin the revered bell-shaped, gaussiancurve. Although there are an infinitenumberofgaussian curves,the oneforthedata athandisdescribedcompletelybythemeanandSD. For example,themean+1.96SD willenclose95%of the

    values;themean2.58SD willenclose99%ofthevalues.It is thissymmetryand elegance that contribute to ouradmirationof thegaussiancurve.

    Thebadnews,especiallyforbiologicdata, is that manycollectionsof measurements or counts are not symmetric or bell shaped. Biologic datatendtobeskewed ordoublehumped,Jshaped, U shaped, or flat on top. Regardlessofthe shape ofthe distribution,it isstill possible byrotearithmeticto calculate anSD althoughitmaybeinappropriateand misleading.

    For example, one can imaginethrowing asix-sided dieseveral hundredtimesandrecordingthescoreateach throw. This would generate aflattopped, ie, rectangular, distribution, withaboutthe samenumber ofcountsforeachscore, 1through6.Themeanof thescoreswould be 3.5and the

    SDwouldbeabout1.7. Thetroubleisthatthe collectionofscoresis notbell

    shaped, sothe SDis not a goodsum

    marystatementofthe trueformof the

    data. (It ismildlyupsetting to some

    "V