Upload
dangdien
View
233
Download
0
Embed Size (px)
Citation preview
aguideto
ProbabilityandStatisticsinMicrosoftExcel™
Resourcestosupportthelearningofmathematics,statisticsandORinhighereducation.
www.mathstore.ac.uk
TheStatisticalEducationthroughProblemSolving(STEPS)glossary
www.stats.gla.ac/steps/glossary
ProbabilityandStatisticsinMicrosoftExcel™
Excelprovidesmorethan100functionsrelatingtoprobabilityandstatistics.Italsohasafacilityforconstructingawiderangeofchartsandgraphsfordisplayingdata.ThisleafletprovidesaquickreferenceguidetoassistyouinharnessingExcel’sstatisticalcapability.Exceptwhereindicated,thefeaturesincludedhereareavailableinExcelVersions4.0andabove.AlmostalltheinstructionsherealsoapplytothespreadsheetfacilityinOpenOffice(http://openoffice.org‐suite.com/);anyslightvariationsincommandsshouldbeobvioustotheuser.Excelisnotdesignedforstatisticalcomputing.Ifyourequirestatisticalanalysisbeyonddatavalidationandmanipulation,tabulation,presentationandcalculationofsummarystatistics,youareadvisedtouseabespokestatisticalpackagesuchasMinitaborSPSS.
ExcelhasanAnalysisToolpakoptional“add‐in”facilitythatincludesmacrosforcarryingoutmanyelementarystatisticalanalyses.Theinstructionsforinstallationofthisadd‐invarywiththeversionofExcel—usetheHelpfacilityinExcelforfurtherinformationonthis.Thisadd‐infacilityisnotusedinthisleaflet.Therearetworeasonswhythisadd‐inshouldbeusedwithcare:• Unlikeotherspreadsheetfunctionality,whichensuresthatcalculationsautomaticallyupdateinthe
lightofchangeselsewhereintheworkbook,theoutputfromtheadd‐inisnotdynamicallylinkedtothesourcedata.Henceifanyofthedatachangetheadd‐inmustberunagaintoobtainupdatedoutput.
• Outputfromtheadd‐incanbemisleading(seehttp://support.microsoft.com/kb/829252forexample).Thereareothercommerciallyavailableadd‐insthatmakeuseofExcel’sfamiliaruserinterfacebutsupplementitsstatisticalfunctionality.Examplesinclude:
Analyse‐it® http://www.analyse‐it.com/
R‐Excel http://rcom.univie.ac.at/
Unistat http://www.unistat.com/
XLSTAT http://www.xlstat.com/en/home/
StatTools http://www.palisade.com/stattools/
UsingthisleafletSupposeyouhaveasampleofthreedata,10.4,11.2and16.4,thatyouhaveenteredintocellsA2:A4onaworksheet.InExcelafunction,e.g.SUM,canbeappliedtothesedatainoneoffourways:
=SUM(10.4,11.2,16.4)=SUM(A2,A3,A4)=SUM(A2:A4)
=SUM(x) wherexisthenameattachedtorangeA2:A4.
Inthisleaflet,forsimplicity,wehavechosentorefertonamedranges.Tonamearange,simplyhighlighttherangeofcells,clickintheNameBoxonthefarleftoftheFormulaBar,typeintherequiredname,e.g.x,thenpressEnter.InExcel2007namescanbemanagedviaFormulas>NameManager.Ifyouprefernottousenamestheninwhatfollowssimplyreplacethenameoftherange,e.g.x,bytherangeaddress,e.g.A2:A4.
DescriptiveStatisticsAssumingasampleofdatainrangex
Sampletotal,Σx =SUM(x)Samplesize,n =COUNT(x)Samplemean,Σx/n =AVERAGE(x)
Samplevariance,s2 =VAR(x)Samplestandarddeviation,s =STDEV(x)Meansquareddeviation =VARP(x)
Rootmeansquareddeviation =STDEVP(x)Correctedsumofsquares,Sxx =DEVSQ(x)Rawsumofsquares,Σx2 =SUMSQ(x)
Minimumvalue =MIN(x)Maximumvalue =MAX(x)
Range =MAX(x)‐MIN(x)LowerQuartile,Q1* =QUARTILE(x,1)Median,Q2 =MEDIAN(x)
UpperQuartile,Q3* =QUARTILE(x,3)Interquartilerange,IQR =QUARTILE(x,3)‐QUARTILE(x,1)KthPercentile =PERCENTILE(x,K%) whereKisanumberbetween0and100
Mode =MODE(x)*Note:Thereareseveraldifferentdefinitionsfortheupperandlowerquartiles,sothevaluescalculatedby
Excelmaynotagreewithyourtextbookorotherstatisticalcalculationtools.Boxplot Seehttp://www.coventry.ac.uk/ec/~nhunt/boxplot.htm
GroupedFrequencyDataAssumingafrequencydistributionwithclassmidpointsstoredinrangexandfrequenciesinrangef:
Samplesize,n =SUM(f)Sampletotal,Σfx =SUMPRODUCT(f,x)
Samplemean,Σfx/n =SUMPRODUCT(f,x)/SUM(f)
Correctedsumofsquares,Sxx =SUMPRODUCT(f,x,x)‐SUMPRODUCT(f,x)^2/SUM(f)Samplevariance,s2 =(SUMPRODUCT(f,x,x)‐SUMPRODUCT(f,x)^2/SUM(f))/(SUM(f)‐1)
Samplestandarddeviation,s =SQRT(Samplevariance)
GraphicalRepresentationsExceloffersawiderangeofcharttypesfordisplayingdata.Manyoftheseareover‐elaborate.In
particular,3‐Deffectscanbemisleadingandshouldbeavoided.
InExcel2007toconstructachartforyourdata:1. Selecttherangecontainingyourdata,includinganyroworcolumnlabels.2. Onthemainribbon,clickontheInserttab.
3. UndertheChartsgroupoficons,selectthecharttyperequired,thenthepreferredchartsubtype.4. UnderChartToolsonthemainribbon,usetheDesign,LayoutandFormattabstocustomisethechart.
InearlierversionsofExcel,selectthedatarangeandthenInsert>CharttoinvoketheChartWizard.
PermutationsandCombinationsNumberofdifferentcombinationsofmobjectsselectedfromnobjectsnCm =COMBIN(n,m)
NumberofdifferentpermutationsofmobjectsselectedfromnobjectsnPm =PERMUT(n,m)
StandardProbabilityDistributionsAssumingarandomvariableXandconstantsaandbBinomial Bin(n,p)P(X=a) =BINOMDIST(a,n,p,FALSE)P(X≤a) =BINOMDIST(a,n,p,TRUE)
Geometric Geom(p)P(X=a) =BINOMDIST(1,a,p,FALSE)/aP(X≤a) =1‐BINOMDIST(0,a,p,FALSE)
Poisson Po(λ )P(X=a) =POISSON(a,lambda,FALSE)P(X≤a) =POISSON(a,lambda,TRUE)
Pascal Pasc(n,p)P(X=a) =NEGBINOMDIST(a‐n,n,p)P(X≤a) =BETADIST(p,n,a‐n+1)/BETADIST(1,n,a‐n+1)
Normal N(µ ,σ2)f(a) =NORMDIST(a,mu,sigma,FALSE)P(X≤a) =NORMDIST(a,mu,sigma,TRUE)P(a≤X≤b) =NORMDIST(b,mu,sigma,TRUE) ‐NORMDIST(a,mu,sigma,TRUE)P(X≥b) =1‐NORMDIST(b,mu,sigma,TRUE)
Exponential Expon(θ )f(a) =EXPONDIST(a,theta,FALSE)P(X≤a) =EXPONDIST(a,theta,TRUE)P(a≤X≤b) =EXP(‐a*theta)‐EXP(‐b*theta)P(X≥b) =EXP(‐b*theta)
Gamma Ga(α ,β )f(a) =GAMMADIST(a,alpha,beta,FALSE)P(X≤a) =GAMMADIST(a,alpha,beta,TRUE)P(a≤X≤b) =GAMMADIST(b,alpha,beta,TRUE) ‐GAMMADIST(a,alpha,beta,TRUE)P(X≥b) =1‐GAMMADIST(b,alpha,beta,TRUE)
TestStatisticsforPopularSignificanceTestsOnesampletestofameanAssumingasampleofdatainrangex,drawnfromapopulationwithmeanµandstandarddeviationσ:
H0:µ=µ0H1:µ≠µ0Teststatistic,z =(AVERAGE(x)‐mu0)/(sigma/SQRT(COUNT(x))) assumingσknownTeststatistic,t =(AVERAGE(x)‐mu0)/(STDEV(x)/SQRT(COUNT(x))) assumingσunknownOnesampletestofavarianceAssumingasampleofdatainrangex,drawnfromapopulationwithmeanµandstandarddeviationσ:
H0:σ2=σ02H1:σ2> σ0
2Teststatistic,χ2 =DEVSQ(x)/sigma0^2TwosampletestofdifferencebetweenmeansAssumingtwosamplesofdatainrangesxandy,drawnfrompopulationswithmeansµ1andµ2andequal
variances:H0:µ1‐µ2=cH1:µ1‐µ2≠cEstimatetheunknowncommonstandarddeviationbythepooledestimate:
s =SQRT((DEVSQ(x)+DEVSQ(y))/(COUNT(x)+COUNT(y)‐2))Teststatistic,t =(AVERAGE(x)‐AVERAGE(y)‐c)/(s*SQRT(1/COUNT(x)+1/COUNT(y)))TwosampletestofratioofvariancesAssumingtwosamplesofdatainrangesxandy,drawnfrompopulationswithvariancesσ1
2andσ22:
H0:σ12=σ2
2H1:σ12>σ2
2Teststatistic,F =VAR(x)/VAR(y)Chi‐squaredtestofassociationAssumingatwo‐waycontingencytableofobservedfrequencies.H0:rowfactorindependentofcolumnfactorH1:someassociationbetweenrowandcolumnfactorsThesuggestedlayoutbelowfora4x2tablecaneasilybemodifiedfortablesofothersizes.
A1: =SUM(C3:D6)A3: =SUM(C3:D3) copydowntoA6C1: =SUM(C3:C6) copyacrosstoD1G3: =$A3*C$1/$A$1 copyintoG3:H6C8: =CHITEST(C3:D6,G3:H6)C9: =(COUNT(A3:A6)‐1)*(COUNT(C1:D1)‐1)C10: =CHIINV(C8,C9)
CriticalValuesandP‐valuesforStatisticalTestsTherearetwoapproachestoconductingsignificancetests.Someanalystsliketocomparetheteststatisticwiththecriticalvalueforagivensignificancelevel;othersprefertocalculatetheP‐valuecorrespondingto
theteststatistic.Excelcanbeusedforeithermethod.Assumingsignificancelevelα,(typicallyα=5%or0.05):
Two‐tailedz‐testUppertailcriticalvalue =NORMSINV(1‐alpha/2)P‐valueforgivenz =2*(1‐NORMSDIST(ABS(z)))Two‐tailedt‐testwithvdegreesoffreedomUppertailcriticalvalue =TINV(alpha,v)P‐valueforgivent =TDIST(ABS(t),v,2)
One‐tailedχ2‐testwithvdegreesoffreedomUppertailcriticalvalue =CHIINV(alpha,v)P‐valueforgivenchisquared=CHIDIST(chisquared,v)One‐tailedF‐testwithv1degreesoffreedominthenumeratorandv2inthedenominatorUppertailcriticalvalue =FINV(alpha,v1,v2)P‐valueforgivenF =FDIST(F,v1,v2)
ConfidenceLimitsAssumingdegreeofconfidence100(1‐α)% (e.g.for95%confidenceα=0.05):
One‐samplestatistics,withdatainrangex
For µ (σ known) Lowerlimit=AVERAGE(x)‐NORMSINV(1‐alpha/2)*sigma/SQRT(COUNT(x))
or =AVERAGE(x)‐CONFIDENCE(alpha,sigma,COUNT(x))
Upperlimit=AVERAGE(x)+NORMSINV(1‐alpha/2)*sigma/SQRT(COUNT(x))or =AVERAGE(x)+CONFIDENCE(alpha,sigma,COUNT(x))
For µ (σ unknown) Lowerlimit=AVERAGE(x)‐TINV(alpha, COUNT(x)‐1)*STDEV(x)/SQRT(COUNT(x))
Upperlimit=AVERAGE(x)+TINV(alpha, COUNT(x)‐1)*STDEV(x)/SQRT(COUNT(x))
For σ2 Lowerlimit=(DEVSQ(x)/CHIINV(alpha/2,COUNT(x))‐1)
Upperlimit=(DEVSQ(x)/CHIINV(1‐alpha/2,COUNT(x))‐1)
Two‐samplestatistics,withdataforthefirstsampleinrangex,andthesecondsampleinrangey
For µ x‐µ y (σ xknown,σ y known)
Lowerlimit=AVERAGE(x)‐AVERAGE(y)‐NORMSINV(1‐alpha/2)*SQRT(sigmax^2/COUNT(x)+sigmay^2/COUNT(y))
Upperlimit=AVERAGE(x)‐AVERAGE(y)+NORMSINV(1‐alpha/2)*SQRT(sigmax^2/COUNT(x)+sigmay^2/COUNT(y))
For µ x‐µ y (σ xandσ y unknownbutassumedequal)
Estimatetheunknowncommonstandarddeviationbythepooledestimate:s =SQRT((DEVSQ(x)+DEVSQ(y))/(COUNT(x)+COUNT(y)‐2))
Lowerlimit=AVERAGE(x)‐AVERAGE(y)‐TINV(alpha,COUNT(x)+COUNT(y)‐2)*s*SQRT(1/COUNT(x)+1/COUNT(y))
Upperlimit=AVERAGE(x)‐AVERAGE(y)+TINV(alpha,COUNT(x)+COUNT(y)‐2)*s*SQRT(1/COUNT(x)+1/COUNT(y))
For σ x2/σ y
2
Lowerlimit=DEVSQ(x)/DEVSQ(y)/FINV(alpha/2,COUNT(x)‐1,COUNT(y)‐1)
Upperlimit(DEVSQ(x)/DEVSQ(y)/FINV(1‐alpha/2,COUNT(x)‐1,COUNT(y)‐1)
SimpleLinearRegressionInExcelVersions5andabove,aregressionline(ortrendline)canbeaddedtoascatterplotbyright‐clicking
ononeoftheplottedpointsandselectingAddTrendlinefromtheshortcutmenu.Bothlinearandavarietyofnon‐linearmodelsmaybefittedtothedata.Theequationofthefittedmodelmaybedisplayed,togetherwiththevalueofthecoefficientofdetermination,R2.Therearealsooptionstoextrapolatethe
trendlineineitherdirection,ortoforcethetrendlinetohaveaspecificintercept.
Thetrendlineapproachispurelygraphical.Tocalculatepredictions,regressionfunctionsmustbeused.
Assumingasampleofvaluesoftheindependentvariableinrangex,andcorrespondingvaluesofthe
dependentvariableinrangey:
Leastsquaresestimateofintercept,a =INTERCEPT(y,x)Leastsquaresestimateofslope,b =SLOPE(y,x)
Sxy =SUMPRODUCT(x,y)‐COUNT(x)*AVERAGE(x)*AVERAGE(y)Sxx =DEVSQ(x)Syy =DEVSQ(y)
Samplecovariance,Cov(x,y) =COVAR(x,y)*COUNT(x)/(COUNT(x)‐1)Estimateofσ,s =STEYX(y,x)
Predictionofyatx=x0,ŷ=a+bx0 =FORECAST(x0,y,x)Estimatedstandarderrorofindividualpredictedyatx=x0
=STEYX(y,x)*SQRT(1+1/COUNT(x)+(x0‐AVERAGE(x))^2/DEVSQ(x))Estimatedstandarderrorofmeanpredictedyatx=x0 =STEYX(y,x)*SQRT(1/COUNT(x)+(x0‐AVERAGE(x))^2/DEVSQ(x))
CorrelationAssumingtwosamplesofpaireddatainrangesxandy:
Pearsonproductmomentcorrelationcoefficient,r =CORREL(x,y)
RankCorrelationAssumingtwosamplesofpaireddatainrangesxandywithnoties:Rankofithvalueinrangex =RANK(INDEX(x,i),x,1)Assumingtwosamplesofpaireddatainrangesxandywithsometiedvalues:Rankofithvalueinrangex =(RANK(INDEX(x,i),x,1)‐RANK(INDEX(x,i),x,0)+COUNT(x)+1)/2Assumingthattherangesrxandrycontaintheranksofthedatainxandyrespectively:Spearmanrankcorrelationcoefficient,rS=CORREL(rx,ry)
Intheexampleabove:
D2: =RANK(B2,$B$2:$B$7,1) copydowntoD7E2: =RANK(B2,$B$2:$B$7,0) copydowntoE7F2: =(D2‐E2+COUNT($B$2:$B$7)+1)/2 copydowntoF7F9: =CORREL(C2:C7,F2:F7) adjustedforties
TimeSeriesTheexamplesbelowrefertothreeyearsofobservedquarterlydata.Forecastsaremadeforafurtherfourquarters(oneextrayear).
Levelonly
Simplemovingaverageperiod5C4: =AVERAGE(B2:B6) copydowntoC11C14: =C$11 copydowntoC17
Centredmovingaverageperiod4D4: =(AVERAGE(B2:B5)+AVERAGE(B3:B6))/2 copydowntoD11D14: =D$11 copydowntoD17
ExponentiallyweightedmovingaverageE2: =B2 initiallevelestimateE3: =$G2*B3+(1‐$G2)*E2 copydowntoE13E14: =E$13 copydowntoE17ThechartwasdrawnbyhighlightingB1:B17andE1:E17thenusingInsert>Charts>Line>2‐DLine.
Levelandconstanttrend
C2: =FORECAST(A2,$B$2:$B$13,$A$2:$A$13) copydowntoC17
Levelandchangingtrend
C2: =B2 initiallevelestimateC3: =$F2*B3+(1‐$F2)*(C2+D2) copydowntoC13D2: =B3‐B2 initialtrendestimateD3: =$G2*(C3‐C2)+(1‐$G2)*D2 copydowntoD13
E3: =C2+D2 copydowntoE13E14: =C$13+(A14‐A$13)*D$13 copydowntoE17
Level,changingtrendandseasonality
C5: =AVERAGE(B2:B5) initiallevelestimateC6: =G$2*B6/E2+(1‐G$2)*(C5+D5) copydowntoC13D5: =(AVERAGE(B6:B9)‐C5)/4 initialtrendestimateD6: =H$2*(C6‐C5)+(1‐H$2)*D5 copydowntoD13
E2: =B2/C$5 copydowntoE5,initialseasonalestimatesE6: =I$2*B6/C6+(1‐I$2)*E2 copydowntoE13F6: =(C5+D5)*E2 copydowntoF13
F14: =(C$13+(A14‐A$13)*D$13)*E10 copydowntoF17
Version9,9June2009