59
Stat 306: Finding Rela1onships in Data. Lecture 6 Sec1on 2.6

Stat 306: Finding Relaonships in Data....stasc (“esmator”) Esmator as a Random Variable β 0 b 0 B 0 β 1 b 1 B 1 σ2 2 s2 S Step 0: From θ, define es1mator, Step 1: Consider

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Stat306:FindingRela1onshipsinData.

    Lecture6Sec1on2.6

  • Recapfromlastlecture2.5(con1nued)

  • Popula'onparameteror“somethingwewouldliketoes'mate”

    Samplesta's'c(“es'mator”)

    Es'matorasaRandomVariable

    ExpectedValueofthees'mator

    Varianceofthees'mator

    StandardErrorofes'mator

    ConfidenceInterval

    β0 b0 B0 E[B0] Var[B0]

    se(b0) C.I.forβ0

    β1 b1 B1 E[B1] Var[B1] se(b1) C.I.forβ1

    σ2

    s2 S2 E[S2] Var[S2] se(s2) C.I.forσ2

    E Var C.I.for

    Step0:Fromθ,definees1mator,

    Step1:Considerthesamplesta1s1c,,asarandomvariable

    Step2:DetermineE[](toconfirmit’sunbiased)Var[](tocalculatese)

    Step3:Definese()=

    Step4:Define(1-α)%C.I.=

  • •  Confusedabouthomogeneityvs.non-consistentwidthofconfidenceintervals?

    0 20 40 60 80 100

    020

    4060

    80100

    x

    y

    σ2isthevarianceofY;constantregardlessofthevalueofx.

    Thebluedashedlineistheconfidenceintervalforthesubpopula1onmean.Inotherwords,itrepresentsthevariabilityinoures1mateofthemeanofYasxchanges.

  • Supposewenowwanttomakeapredic'onforanewvalueofx.Example:Supposewewouldliketopredicthowmuchmoney(Y),

    someoneaged50yearsold(X=50)willhave.

    0 20 40 60 80 100

    020

    4060

    80100

    x

    y

    Predic'onsandpredic'onintervals

  • Example:Supposewewouldliketopredicthowmuchmoney(Y), someoneagedX=50yearsoldwillhave.

    thishypothe1calnewpersonaged50issome1mescalled“anout-of-sampleunitwithvaluex*”,Wherex*=50.Ourbestes1mate,alsoknownasthe“pointpredic1on”,wouldbeequaltob0+b1(50)=45.1

    Predic'onsandpredic'onintervals

  • Predic'onsandpredic'onintervals

    0 20 40 60 80 100

    020

    4060

    80100

    x

    y

  • Example:Supposewewouldliketopredicthowmuchmoney(Y), someoneagedX=50yearsoldwillhave.

    Predic'onsandpredic'onintervals

  • Example:Supposewewouldliketopredicthowmuchmoney(Y), someoneagedX=60yearsoldwillhave.

    Ourpredic1onThetruth

    Thedifferencebetweenourpredic1onandthetruthistheerror

    Predic'onsandpredic'onintervals

  • Example:Supposewewouldliketopredicthowmuchmoney(Y), someoneagedX=60yearsoldwillhave.

    Ourpredic1onThetruth

    Thedifferencebetweenourpredic1onandthetruthistheerror

    Predic'onsandpredic'onintervals

  • Example:Supposewewouldliketopredicthowmuchmoney(Y), someoneagedX=60yearsoldwillhave.

    Ourpredic1onThetruth

    Thedifferencebetweenourpredic1onandthetruthistheerror

    Cov()isequalto0,sincethetwotermsareindependent.

    Predic'onsandpredic'onintervals

  • Example:Supposewewouldliketopredicthowmuchmoney(Y), someoneagedX=60yearsoldwillhave.

    Ourpredic1onThetruth

    is

    Note

    Predic'onsandpredic'onintervals

  • Example:Supposewewouldliketopredicthowmuchmoney(Y), someoneagedX=60yearsoldwillhave.

    Ourpredic1onThetruth

    is

    Note

    Predic'onsandpredic'onintervals

  • Predic'onsandpredic'onintervals

  • se(E)=

    Predic'onsandpredic'onintervals

  • se(E)=

    Predic'onsandpredic'onintervals

  • 0 20 40 60 80 100

    020

    4060

    80100

    x

    y

    Predic'onsandpredic'onintervals

  • 0 20 40 60 80 100

    020

    4060

    80100

    x

    y

    Predic'onsandpredic'onintervals

  • Agevs.MoneyObjec've: Thepurposeofthisobserva1onalstudywasto

    demonstrateif,andtowhatextent,ageis associatedwithmoney.

    DesignandMethods: Wecollectedarandomsampleofindividualsandforeach

    determinedtheirage(recordedinyears)andtheamount ofmoney(indollars)intheiraccounts.Analysisof thedatawasdoneusinglinearregression.

    Results: Weobtainedarandomsampleofn=9subjects. Thereisa

    sta1s1callysignificantassocia1onbetweenageandmoney(p-value=0.036). Foreveryaddi1onalyearinage,anindividual’samountofmoneyincreases onaveragebyanes1matedof$0.55(95%C.I.=[$0.05,$1.05]).

    Conclusions: Wefoundthat,ashypothesized,ageisassociatedwithmoney. Inoursampleageaccountedforabouthalfofthevariability observedinmoney(R2=0.49).Wepredictthata50yearoldwill have$45.1(95%P.I.=[$5.6,$84.5]),whereasa40year oldwillhave$39.6(95%P.I.=[$0.8,$78.4]).

    SmallPrint: Theanalysisrestsonthefollowingassump1ons:

    - theobserva1onsareindependentlyandiden1callydistributed. - theresponsevariable,money,isnormallydistributed. - Homoscedas1cityofresidualsorequalvariance. - therela1onshipbetweenresponseandpredictorvariablesislinear.

    Forparameterβ1:

  • se(subpopula'onmean)VS.se(predic'onerror)

    Subpopula1onmean:

    Whereas is:

  • 2.6Explana1onofStudenttquan1lesintheintervales1mates

    2.6.1.Historylessonaboutthet-test2.6.2.Threeimportantthingstoknowaboutanormalrandomvariable2.6.3Es1matorsasRandomVariables(onemore1me!)2.6.4Explana1onofStudenttquan1les

  • 2.6Explana1onofStudenttquan1lesintheintervales1mates

    2.6.1.Historylessonaboutthet-test2.6.2.Threeimportantthingstoknowaboutanormalrandomvariable2.6.3Es1matorsasRandomVariables(onemore1me!)2.6.4Explana1onofStudenttquan1les

  • 2.6.1.Historylessonaboutthet-testStudentisthepublica1onpseudonymfor

    WilliamGosset,whodevelopedmethodsforinferenceofmeansforsmallsampleswhileworkingatGuinnessBrewery(Ireland)inearly1900s.htps://en.wikipedia.org/wiki/William_Sealy_Gosset

    WilliamSealyGosset(aka“Student”):“Isthisbatchofbeeranydifferentthanthestandard?”“Let’shaveatastetest!…t-testanyone?”

  • 2.6Explana1onofStudenttquan1lesintheintervales1mates

    2.6.1.Historylessonaboutthet-test2.6.2.Threeimportantthingstoknowaboutanormalrandomvariable2.6.3Es1matorsasRandomVariables(onemore1me!)2.6.4Explana1onofStudenttquan1les

  • •  Thing1:–  Linearcombina1onsofindependentnormalrandomvariablesalsohavenormaldistribu1ons!(seeAppendixB)

    2.6.2.Threeimportantthingstoknow

    aboutanormalrandomvariable

  • •  Thing1:–  Linearcombina1onsofindependentnormalrandomvariablesalsohavenormaldistribu1ons!(seeAppendixB)

    Forexample: Let: W1beanormalrandomvariable andW2beanormalrandomvariable, Then: W3=aW1+bW2isanormalr.v. foranynumbersaandb.

    2.6.2.Threeimportantthingstoknow

    aboutanormalrandomvariable

  • •  Thing2:–  Anormalrandomvariablecanbeconvertedtoastandardnormalrandomvariable.

    2.6.2.Threeimportantthingstoknow

    aboutanormalrandomvariable

  • •  Thing2:–  Anormalrandomvariablecanbeconvertedtoastandardnormalrandomvariable.

    2.6.2.Threeimportantthingstoknow

    aboutanormalrandomvariable

  • •  Thing3:–  Ifthevarianceisunknown,wemustusethetdistribu1on.

    2.6.2.Threeimportantthingstoknow

    aboutanormalrandomvariable

  • •  Thing3:–  Ifthevarianceisunknown,wemustusethetdistribu1on.

    2.6.2.Threeimportantthingstoknow

    aboutanormalrandomvariable

  • 2.6Explana1onofStudenttquan1lesintheintervales1mates

    2.6.1.Historylessonaboutthet-test2.6.2.Threeimportantthingstoknowaboutanormalrandomvariable2.6.3Es'matorsasRandomVariables(onemore'me!)2.6.4Explana1onofStudenttquan1les

  • Popula'onparameteror“somethingwewouldliketoes'mate”

    Samplesta's'c(“es'mator”)

    Es'matorasaRandomVariable

    ExpectedValueofthees'mator

    Varianceofthees'mator

    StandardErrorofes'mator

    ConfidenceInterval

    β0 b0 B0 E[B0] Var[B0]

    se(b0) C.I.forβ0

    β1 b1 B1 E[B1] Var[B1] se(b1) C.I.forβ1

    σ2

    s2 S2 E[S2] Var[S2] se(s2) C.I.forσ2

    E Var C.I.for

    Step0:Fromθ,definees1mator,

    Step1:Considerthesamplesta1s1c,,asarandomvariable

    Step2:DetermineE[](toconfirmit’sunbiased)Var[](tocalculatese)

    Step3:Definese()=

    Step4:Define(1-α)%C.I.=

  • Popula'onparameteror“somethingwewouldliketoes'mate”

    Samplesta's'c(“es'mator”)

    Es'matorasaRandomVariable

    β0 b0 B0

    β1 b1 B1

    σ2

    s2 S2

    Step0:Fromθ,definees1mator,

    Step1:Considerthesamplesta1s1c,,asarandomvariable

    SinceB1isalinearcombina1onoftheYis(NormalRVs),then(withThing1):

    Recall:

    and:

    and:

    ,where:

  • Popula'onparameteror“somethingwewouldliketoes'mate”

    Samplesta's'c(“es'mator”)

    Es'matorasaRandomVariable

    β0 b0 B0

    β1 b1 B1

    σ2

    s2 S2

    Step0:Fromθ,definees1mator,

    Step1:Considerthesamplesta1s1c,,asarandomvariable

    Recall:

    Also:

    Fortheintercept,wecan,again,makeuseofthefactthatB0isalinearcombina1onofnormalrandomvariables(Thing1):

  • Popula'onparameteror“somethingwewouldliketoes'mate”

    Samplesta's'c(“es'mator”)

    Es'matorasaRandomVariable

    β0 b0 B0

    β1 b1 B1

    σ2

    s2 S2

    Step0:Fromθ,definees1mator,

    Step1:Considerthesamplesta1s1c,,asarandomvariable

  • Popula'onparameteror“somethingwewouldliketoes'mate”

    Samplesta's'c(“es'mator”)

    Es'matorasaRandomVariable

    β0 b0 B0

    β1 b1 B1

    σ2

    s2 S2

    Step0:Fromθ,definees1mator,

    Step1:Considerthesamplesta1s1c,,asarandomvariable

    Wehavethat:

    Andagain,alinearcombina1onofnormalrandomvariablesisanormalrandomvariable(Thing1):

  • Popula'onparameteror“somethingwewouldliketoes'mate”

    Samplesta's'c(“es'mator”)

    Es'matorasaRandomVariable

    β0 b0 B0

    β1 b1 B1

    σ2

    s2 S2

    Step0:Fromθ,definees1mator,

    Step1:Considerthesamplesta1s1c,,asarandomvariable

  • 2.6Explana1onofStudenttquan1lesintheintervales1mates

    2.6.1.Historylessonaboutthet-test2.6.2.Threeimportantthingstoknowaboutanormalrandomvariable2.6.3Es1matorsasRandomVariables(onemore1me!)2.6.4Explana'onofStudenttquan'les

  • Popula'onparameteror“somethingwewouldliketoes'mate”

    Samplesta's'c(“es'mator”)

    Es'matorasaRandomVariable

    β0 b0 B0

    β1 b1 B1

    σ2

    s2 S2

    Step0:Fromθ,definees1mator,

    Step1:Considerthesamplesta1s1c,,asarandomvariable

    WithThing2,wehave:

    Butwedonotknowthevariance.Weonlyhaveanes1mateofthevariance,so(withThing3):

    Andtherefore:

  • Butwedonotknowthevariance,sowithThing3,wehave:

    WithThing2,wehave:

    Andtherefore:

    WithThing1,wehave:

    2.6.4Explana'onofStudenttquan'lesEs'matorasaRandomVariable

    B0

    B1

    S2

  • Butwedonotknowthevariance,sowithThing3,wehave:

    WithThing2,wehave:

    WithThing1,wehave:

    2.6.4Explana'onofStudenttquan'les

    Andtherefore:

    withα=0.05:

    Es'matorasaRandomVariable

    B0

    B1

    S2

  • Butwedonotknowthevariance,sowithThing3,wehave:

    WithThing2,wehave:

    Andtherefore:

    WithThing1,wehave:

    95%C.I.forβ1=

    where: ,

    2.6.4Explana'onofStudenttquan'lesEs'matorasaRandomVariable

    B0

    B1

    S2

  • Butwedonotknowthevariance,sowithThing3,wehave:

    WithThing2,wehave:

    WithThing1,wehave:

    2.6.4Explana'onofStudenttquan'les

    Andtherefore:

    withα=0.05:

    Es'matorasaRandomVariable

    B0

    B1

    S2

  • Butwedonotknowthevariance,sowithThing3,wehave:

    WithThing2,wehave:

    WithThing1,wehave:

    2.6.4Explana'onofStudenttquan'les

    Andtherefore:

    Es'matorasaRandomVariable

    B0

    B1

    S2

  • Butwedonotknowthevariance,sowithThing3,wehave…

    WithThing2,wehave…

    WithThing1,wehave:

    2.6.4Explana'onofStudenttquan'les

    Andtherefore:

    where:

    Es'matorasaRandomVariable

    B0

    B1

    S2

  • 2.6.4Explana'onofStudenttquan'les

  • HypothesisTestH0:β1=0H1:β1≠0

    Wehave:

    “Null” hy)othesis

    “Alter1ative” hy)othesis

    2.6.4Explana'onofStudenttquan'les

  • HypothesisTestH0:β1=0H1:β1≠0

    Wehave:

    “Null” hy)othesis

    “Alter1ative” hy)othesis

    =0

    2.6.4Explana'onofStudenttquan'les

  • HypothesisTestH0:β1=0H1:β1≠0

    Wehave:

    “Null” hy)othesis

    “Alter1ative” hy)othesis

    =0

    Therefore,“underthenull”,wehave:

    2.6.4Explana'onofStudenttquan'les

  • Evenifwedecidetorecord“Age”(x)inmonthsand“Money”(Y)inpennies,“underthenull”,wes1llhave:

    2.6.4Explana'onofStudenttquan'les

  • Evenifwedecidetorecord“Age”(x)inmonthsand“Money”(Y)inpennies,“underthenull”,wes1llhave:

    Therefore…Ifβ1(theslope)wasactuallyequalto0,itwouldbeveryunlikelythattheabsolutet-statwouldbeverylarge.

    and:

    2.6.4Explana'onofStudenttquan'les

  • Evenifwedecidetorecord“Age”(x)inmonthsand“Money”(Y)inpennies,“underthenull”,wes1llhave:

    Therefore…Ifβ1(theslope)wasactuallyequalto0,itwouldbeveryunlikelythattheabsolutet-statwouldbeverylarge.

    and:

    2.6.4Explana'onofStudenttquan'les

  • 2.6.4Explana'onofStudenttquan'les

  • 2.6.4Explana'onofStudenttquan'les

  • 2.6.4Explana'onofStudenttquan'les

  • 2.6.4Explana'onofStudenttquan'les

  • 2.6.4Explana'onofStudenttquan'les

  • Explana1onofThing3:

    2.6.4Explana'onofStudenttquan'les

  • Agevs.Money

    Popula'on

    dollars($)Inbankaccount

    Popula1onparameters

    HypothesisTest

    Sample,n=9Samplesta1s1cs

    β0, σ2β1,

    H0:β1=0H1:β1≠0

    82

    22

    4571

    29

    129

    1824

    x y 71

    54

    43452111304510

    AgeinYears

    PREDICTOR variable

    x RESPONSE variable

    Y

    b0=17.7b1=0.55s=15.5R2=0.49

    Forparameterβ1:

    linearregression