17
Summary Part 1 Population = full Research Results (N) Sample = Part of Population (n) Types of Data: 1. categorical = responses (eg. eye colour, level of satisfaction...) a) Nominal = yes/ no answers b) ordinal = values ( eg. 1.poor, 2. average, 3.good) 2.Numerical a) continues = counting (full numbers eg. age) b) discrete = measurement (eg. weight) Comulative Frequency Distribution = Sum of Populations or Samples to Certain Point eg. Class Frequency Percentage Cumulative Fr. Cumulative P. 10 but less than 20 3 15% 3 15% 20 but less than 30 6 30% 9 45% 30 but less than 40 5 25% 14 70% adding fr. values adding p. values Xi = ith value of the variable X Arithmetic Mean (Population )= μ= N i N i=1 x = N x1+x2+x3+...xN x1,x2,x3…= Population Values Arithmetic Mean (Sample ) = x = n i n i=n x = n x1+x2+x3+...xn x1,x2,x3…= Sample Value Median = Value which Stands in the Middle (eg. 1,2,2,3,3,4,5 Median is 3) 1

Stats Midterm

Embed Size (px)

DESCRIPTION

Stats Midterm exam 2014

Citation preview

  • SummaryPart1

    Population=fullResearchResults(N)Sample=PartofPopulation(n)TypesofData:1.categorical=responses(eg.eyecolour,levelofsatisfaction...)a)Nominal=yes/noanswersb)ordinal=values(eg.1.poor,2.average,3.good)2.Numericala)continues=counting(fullnumberseg.age)b)discrete=measurement(eg.weight)ComulativeFrequencyDistribution=SumofPopulationsorSamplestoCertainPointeg.

    Class Frequency Percentage CumulativeFr. CumulativeP.

    10butlessthan20

    3 15% 3 15%

    20butlessthan30

    6 30% 9 45%

    30butlessthan40

    5 25% 14 70%

    addingfr.values addingp.values

    Xi=ithvalueofthevariableX

    ArithmeticMean(Population)= = Ni

    N

    i=1x

    = Nx1+x2+x3+...xN

    x1,x2,x3=PopulationValues

    ArithmeticMean(Sample)=x = ni

    n

    i=nx

    = nx1+x2+x3+...xn

    x1,x2,x3=SampleValueMedian=ValuewhichStandsintheMiddle(eg.1,2,2,3,3,4,5Medianis3)

    1

  • PositionalsoCalculatedby: 2n+1

    Note:IfevenAmountofNumberstheAverageoftheTwointheMiddleisMedian

    Variance(Sample)= s2 = n1n

    i=1(xix)2

    Variance(Population)=sameFormulaotherSymbols: s2 = 2 x = n1=N

    StandardDeviation(Sample)= s = n1n

    i=1(xix)2

    StandardDeviation(Population)=sameFormulaotherSymbolss.o.:BasicallyStandardDeviationisforboth:variance

    CoefficientofVariation= V ) 00%C = ( sx 1 MeasuresrelativeVariation expressesthestandarddeviationasapercentageofthemean AlwaysinPercentage ShowsVariationrelativetoMean CanbeusedtoComparetwoormoresetsofDatameasuredindifferentunits

    Covariance(Sample) OV (x, )C y = sx,y = n1(xix)(yiy)

    n

    i=1 Covariance(Population)=SameFormulaotherSymbolss.o.

    MeasuresthestrengthoflinearRelationshipbetweentwovariables

    2

  • ResultsofCovariance:COV(x,y)>0=xandytendtomoveinthesamedirectionCOV(x,y)=0=thereisnolinearrelationshipbetweenxandyCOV(x,y) = 2 = (x ) P (x)

    x 2 =(x ) (x ) x ) (x ).....1 2 P 1 + ( 2 2 P 2

    FunctionsofRandomVariables P(x)istheProbabilityforXg(x)isafunctiondescribingX

    ExpectedValue:E(g(x))= (x)P (x)

    xg

    Ifg(x)=Xwegetthenormalfunction Ifg(x)=(x wegettheformulaforvariance) 2

    SpecialcaseifXisalwaysthesamevariablethanwecansaythattheMeanisXandtheVariance=0

    IfthereisavariablebeforeourXwejustmultiplythemfortheExpectedValue

    3

  • E(bX)=b 2

    IfthereisavariablebeforeourXwejustsquareittomultiplyitwiththeVariancetogettheVarianceofthatEquation

    Var(bX)= b2 2 Example:ConsiderZ=a+bXXhasMeanof andVarianceofx x2=> (a x) z = E + b = a + b x => =>standarddeviationofZ=ar(a x) z2 = V + b = b2 2

    b| | x

    SPECIALCASE!!!!(abitcomplicatedbutstepbystepeasy)Z= x

    Xx ExpectedValueZ:

    ((X )/ ) (E(X) )/ ( )/ / z = E x x = x x = x x x = 0 x = 0WearesimplyusingtherulesthatwecanexcludetheVariableXfromtheotherconstants.Insteadofa+bxwehavetheopposite:(xa)/binwhich(a= =>wecanusetheandb ) x = x ruleVarianceofZ:

    ar((X )/ ) ar(X/ ) ar( / ) ar(1/ ) 1/ ) ar(X) 1/ ) z2 = V x x = V x V x x = V x X = ( x2 V = (

    2 x2 = x2x2 = 1

    Looksworsethanitis.aswellweareusingtherule.FirstweareseparatingtheXvaluefromtheavalue( thanwecanjustletitfallbecausewhenwelookforvariancewedonttake)x intoaccounttheconstantwhichweaddorsubtract.ThanwesimplytakeXseparatelyfromthebvalue .BecauseweknowtogettheVariancewesimplytakethebvaluetothe) (x squareandsolvetheVariancevalueforX.BecausethevariancevalueandthebValueareboth andwehavetodividethemfromeachotherwegetthevalueof1.x2 BernoulliDistribution:

    justtwopossibilities:success/failure P=probabilityofsuccess 1P=probabilityoffailure Randomvariablexdefinedas1ifsuccessand0iffailure

    P(X=1)=PandP(X=0)=1P

    Mean: P = n TheVariance: P (1 ) 2 = n P TheNumberofsequencesofx(success)inntrials: Cxn = n!x!(nx)!

    4

  • BernoulliProbabilityDistribution: Hastohaveafixednumberofn Pofsuccessandfailureaddupto1anddontchangeduringtheexperiment,

    independentfromeachotherP(x)= P (1 )n!x!(nx)!

    x P nx =>ProbabilityofxsuccessesinntrialswiththeprobabilityofPoneachtrialJointProbabilityFunction

    XtakesthespecificvaluexandYtakesthevalueyasafunctionofxandy P(x,y)=P(X=x ) y = Y MarginalProbabilitiesare: P(x)= (x, )

    yP y P(y)= (x, )

    xP y

    ConditionalProbabilityFunction

    YtakesthevalueofyandxisspecifiedforX=P(yIx)= P(x)P(x,y)

    XtakethevalueofxandyisspecifiedforY=P(XIY=y)= P(Y=y)P(X ,Y=y)

    (slightlydifferent) IndependentwhenP(x)P(y)=P(x,y) Covariance:Thestrengthoflinearrelationship

    Cov(X,Y)=E((X )(Y )) (x )(y )P (x, )x y =

    x

    y x y y

    Correlation:p=Corr(X,Y)= x yCOV (X ,Y )

    p=0norelationship p>0positiverelationship=>whenXishighYaswell pwhenXhighYlow

    ComulativeDistributionFunction

    expressestheprobabilitythatXdoesntexceedxF(x)=P(X ) x example:aandb,twovaluesofX,aP(a

  • NormalDistributionFunction lookslikeabell symmetrically Mean,MedianandModeareequal Locationisdeterminedby >changingitshiftsthedistributionto

    leftorright Spreadisdeterminedby >changingitspreadsorcloses Therandomvariablehasaninfiniterange anynormaldistributionfunctioncanbeturnedintoastandardized

    normaldistribution(Z)=Z >thestandardizednormal= X

    distributionshavegenerallyameanof0andavarianceof1 UseTable1inthebooktogetfromaZvaluetheF(Z)value

    JointCumulativeDistributionFunction

    SupposeXandYarecontinuousrandomvariables Thefunctionisdescribed:F(x,y) ItdefinesthatXislessthanxsimultaneouslyYislessthany F(x,y)=P(X

  • theirdifferenceis:Var(XY)= Cov(X, ) x2 y2 2 Y

    linearcombinationofXandY(whereaandbareconstant),W=aX+bY

    themeanofWis, (W ) (aX Y ) w = E = E + b = a x + b y thevarianceofWis, abCorr(X, ) w2 = a2 x2 + b2 y2 + 2 Y x y

    Note:ifXandYarenormallydistributedWisaswell

    Part3 DescriptiveStatistics:Collecting,presentinganddescribingdata InferentialStatistics:Drawingconclusionsordecisionsconcerninga

    PopulationbasedonSampleDataSamplingDistributions

    distributionofallvaluesofasamplefromapopulationTheStepstodevelopaSampleDistribution:

    1. Listthegivenvalues(example,N=4,X=ageofthe4individually,ValuesofX=18,20,22,24

    2. CalculateMeanandStandardDeviation(Population)a. 1 = 418+20+22+24 = 2

    b. .236 = N(Xi)2 = 2

    3. Allpossiblesamplecombinationsinatable: 18 20 22 24

    18 18,18 18,20 18,22 18,24

    20 20,18 20,20 20,22 20,24

    22 22,18 22,20 22,22 22,24

    24 24,18 24,20 24,22 24,24

    7

  • 4. Thandrawameantable 18 20 22 24

    18 18 19 20 21

    20 19 20 21 22

    22 20 21 22 23

    24 21 22 23 245. =>16SampleMeans6. SummaryofSamplingDistribution:

    a. 1 = NXi = 1618+19+19+20+20+20+21+21+21+21+22+22+22+23+23+24 = 2

    b. .58 X

    = N(Xi)2 = 16(1821) +(1921) +(1921) ...(2421)2 2 2 2 = 1

    7. ComparingthePopulationandSample:a. Population:

    i. N=4ii. 1 = 2 iii. .236 = 2

    b. Sample:i. n=2ii. 1 = 2 iii. .58 = 1

    ExpectedValueofSampleMeanDistribution

    iX = n1 n

    i=1X

    StandardErroroftheMean DescribestheVariabilityintheMean:

    X= n

    DecreaseswhenSampleSizeincreases

    8

  • IfthePopulationisNormal samplingdistributionalsonormallydistributed and

    X=

    X= n

    ZValueforSampleMeanDistributions

    =Z = X(X)

    n

    (X)

    amplemean X = s opulationmean = p populationstandarddeviation = n=samplesize

    IfPopulationisnotnormal

    approximatelynormalifn>25Example: = 3 = 8 n= 63 Probabilitythat between7.8and8.2=?X n>25=>approxnormal=> & =

    XX

    = n .5

    X= 336 = 0

    P(7.8

  • ChiSquareDistribution dependsondegreesoffreedom=n1=d.f. table7 n12 = 2

    (n1)s2 exampletofindProbability

    Freezerhastoholdtemperaturewithlittlevariation standarddeviationofnomorethan4=> = 4 Sample14Freezeraretested=>14=n=>d.f.=13 Whatistheprobabilitythatthesamplevarianceexceeds

    27.52?=> ?s2 = 7.52) ( ) ( 2.36) ( 2.36) .05 P (s2 > 2 = P 2

    (n1)s2 > 16(141)27.52 = P 16

    (n1)s2 > 2 = P 132 > 2 = 0 P( 2.36) .05 132 > 2 = 0 Table7:d.f.1322.36as =>P=0.05

    FindingtheChiValue n1=141=13=d.f. 0.05 = 2.36 132 = 2

    PointandIntervalEstimates

    Pointestimateisasinglenumber Intervalestimatesisthewidthofalowerbutstillreliablepointtoa

    upperbutstillreliablepointalsoknownasconfidenceinterval IfP(a<

  • confidencetDistribution

    Considerasampleofnobservations meanof andstandarddeviationsx normallydistributedpopulationwithmeanof n1degreesoffreedom

    Thenvariable: t = sn

    x Weusetdistributionwhenpopulationstandarddeviationis

    unknownanduseinstead(s=samplestandarddeviation) =>notthataccuratebecauseweusejustasample Assumption:

    Populationstandarddeviationisunknown populationisnormallydistributed ifpopulationisnotnormalusbiggersample

    UseTDistribution (1 )confidenceintervalEstimate x tn1,/2 sn < < x + tn1,/2

    sn

    tdependsondegreesoffreedom useTable8forsolving Example

    Samplen=25 s=8forma95%confidenceinterval0x = 5 for

    d.f.251=24(1 )=0.95=>0.05= =0.025 /2 =2.0639tn1/2 = t 240.025

    50(2.0639)

  • ToexplainIusethefollowingexample: Randomsampleof100people25arelefthanded95%

    confidenceintervalforthetrueproportionoflefthanders

    p Za/2 np(1p) < P < p + Za/2 np(1p) = =>10.95=0.050.05/2=0.025p 25100 Za/2

    0.95+0.025=0.975ZTable:lookintheF(Z)for0.975=>1.96n=100

    .96 .96 25100 1 1000.25(10.25) < P < 25100 + 1 1000.25(10.25) 0.1651

  • x ) ( y Za/2 x ) nxx2 + nyy2

    < x y < ( y + Za/2 nxx2 + nyy2

    areunknownand x2 y2

    Assumption: Samplesareindependentandrandom Populationsarenormallydistributed PopulationVariancesareunknownandassumedunequal

    Useatdistributionwithvdegreesoffreedom

    v =( + )nxsx2

    nysy2 2

    ( )+( )(n 1)x( )nxsx2 2

    (n 1)y( )nysy2 2

    TheconfidenceIntervalisdescribedasfollows: n1,a2 =

  • Firstdetermineeverything: n1=171=16 =(10.95)/2=0.0251 =0.975thanfindChi2a 2a

    Values: > 8.85Xn1,a/2

    2 = X171,0.0252 2

    > .91Xn1,1a/22 = X171,1a/2

    2 6 Thanfind 4s2 = 7 2 Nowfillitintheformula: H0 : = 3

    Referstostatusquo(notguilty) containsalways=, or mayormaynotberejected

    AlternativeHypothesis assumestheoppositeof (inourexample: )H0 = H1 : / 3 containsalways=, or / < > Mayornotmaybesupported Example:thepopulationmeanageis50=> 0H0 : = 5

    nowweselectasampleandcalculatethemean.Letssupposeitwas 20=>unlikelyNullhypothesisistrueX =

    14

  • Levelofsignificance Definestherejectionregionofthesampledistribution writtenas typicalvaluesare0.01,0.05,0.1 isselectedbyresearcher providesthecriticalvalues Typesoftests(3isanexampleforanynumber)

    TwoTailtest: H0 : = 3 = H1 : / 3

    UpperTailtest: H0 : 3 >3H1 :

    LowerTailtest: H0 : 3

  • ConsidertheTest:

    H0 : = 0 H1 : > 0

    TheDecisionRuleis: Reject if zH0 =

    n

    x0 > za AlternateRule:

    Reject if H0 X > 0 + Za n PValue

    ProbabilityobtainingaTeststatisticmoreextremethantheobservedsamplevaluegiventhat istrueH0

    alsocalledobservedValueofSignificance showsthesmallestvalueof forwhich canberejected H0

    Convertsampleresult(eg. )toteststatistic(eg.zstatistic) x Exampleofuppertailtest:

    obtainpvalue pvalue=(P> =>, giventhatH istrue)/n

    x0 0 (Z )P > /n

    x0 = 0 DecisionRulecomparethepvalueto

    Ifpvalue< ,reject H0 Ifpvalue ,dontreject H0

    OneTailTest

    alternativeHypothesisfocusesononeDirection if is" thensomething, itsanuppertailtest H1 > " if is" thensomething, itsalowertailtest H1 < " LoweranduppertailtestshavejustonecriticalValuesince

    therejectionareaisinonlyonetailTwoTailTest

    twocriticalvaluesdefiningthetwoareasofrejection

    16

  • tTestofHypothesisfortheMean( Unknown) convertsampleresults( toatteststatistic)x

    ConsidertheTest: H0 : = 0 H1 : > 0

    TheDecisionRuleis: Reject if tH0 = s

    n

    x0 > tn1,a Foratwotailedtest:

    H0 : = 0 = H1 : / 0

    TheDecisionRuleis tn1,a/2 TestofthePopulationProportion

    involvescategoricalvalues twooutcomes

    success(acertaincharacteristicispresent) failure(acertaincharacteristicisnotpresent)

    ProportionofthepopulationiswrittenasP SampleSizeislarge SampleProportioninthesuccessareaiswritten p"

    p= xn = samplesize

    numberof successesinsample

    ifnP(1P)>9, canbeseenasapproximatelynormaldistributed p ThereforeMean=

    p= P

    andstandardDeviation= p

    = nP(1P) HypothesisTestforProportion(nP(1P)>9)

    ZVALUEbecausenormaldistributed Z = pP

    0

    nP (1P )0 0

    17