49
Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall [email protected] marshalltaylor.net

Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall [email protected] marshalltaylor.net Recap •

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Day2,Morning:TheLogicofDistribu8ons

Instructor:[email protected]

Page 2: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Recap

•  Yesterdaymorningwetalkedabout:– Thedescrip8veandinferen8alpurposesofsta8s8cs

– Thedifferencebetweensamples,popula8ons,andtheissuesthatarisebecauseofsamplingerrorandsamplebias—andthewaysinwhichprobabilitytheorycanbeusedtoaddresstheformer.

– Thebasicsofprobabilitytheory

Page 3: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Recap

•  Yesterdayeveningwetalkedabout:– Thedifferencebetweenunivariate,bivariate,andmul8variatesta8s8cs.

– Whatavariableisandthegeneralformsthatitcantake:nominal,ordinal,orcon8nuous.

– Measuresofcentraltendencyforeachtypeofvariable.

– Measuresofdispersion.

Page 4: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

GamePlanforToday•  Morning– Wewillbringthesepreviouslecturestogethertoshowhowwecanuseprobabilitytheorytoassesshowrepresenta8veoursampledistribu.onisofthepopula.ondistribu.on.

•  Evening– AReroutliningtheasympto.ctheoryofprobabilitydistribu.onsinthemorning,wewillthenexaminebasicunivariatetestsofsta.s.calinferencetoquan8fyhowwelloursamplesta8s8csapproximatepopula8onparametersnetofsamplingerror.

Page 5: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Whatisadistribu8on?

Insta8s8cs,adistribu.onissimplythearrayofvaluesforoneormorevariablesacrossasetofunits(people,groups,etc.).

Page 6: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Thedistribu8oniseverything

•  Theconceptof“distribu8on”hasbeenattheimplicitcenterofabsolutelyeverythingwehavetalkedaboutsofar!

•  Wehavespecificallylookedatsamplesta8s8cdistribu8ons,suchas…

Page 7: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

FrequencyDistribu8ons

Total 10,335 100.00 excellent 2,407 23.29 100.00 good 2,591 25.07 76.71 average 2,938 28.43 51.64 fair 1,670 16.16 23.21 poor 729 7.05 7.05 5=excellent Freq. Percent Cum.1=poor,...,

. tab health

Page 8: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Frequencydistribu8onsas“histograms”

050

010

0015

00Fr

eque

ncy

50 100 150 200 250 300systolic blood pressure

Page 9: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Bimodaldistribu8onwithcategoricalvariables

0.2

.4.6

.8

Male Female

Solid R or D Likely R or DLeaning R or D Toss-Up

Page 10: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Bimodaldistribu8onwithcon8nuousvariables

020

040

060

0Fr

eque

ncy

20 40 60 80age in years

Notethetwopeaks.

Page 11: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Distribu8onsandProbabilityTheory

•  Distribu8onsservemuchmorethanadescrip8vepurpose.

•  Werelyontheasympto.ctheoryofprobabilitydistribu.onstomakesta.s.calinferences.

•  Suchatheorygivesusdependableideasaboutwhatthesamplingdistribu.onwilllooklike.

Page 12: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Whatis“asympto8c”?

•  “Asympto8c”referstothepropertythat,ifsampledaninfinitenumberof8mes,asta8s8cwillconvergetothepopula8onparameteritismeanttoapproximate.

•  Thatis,as,assumingthatallnarerandomsamplesofN.

Page 13: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Whatisa“probabilitydistribu8on”?

•  A“probabilitydistribu8on”isanarrayofprobabilis8cvaluesforavariableacrossasetofunits,wherethevaluesarepropor8onsthatmustsumto1.

V1 V2 V3 V4 V5 RMargins1 0.0000576938 0.1088889841 0.1090984138 0.0000576950 0.7818972133 12 0.0000476265 0.0000476352 0.9123043912 0.0119956500 0.0756046971 13 0.0000881123 0.0179937158 0.0264640660 0.0000881160 0.9553659899 14 0.1108000716 0.2503997467 0.5927814401 0.0459855150 0.0000332266 15 0.2411895005 0.0404012131 0.6524702741 0.0001041705 0.0658348418 16 0.1470047772 0.0002170118 0.6761725881 0.0002169660 0.1763886570 17 0.8856909695 0.0001638012 0.0001637739 0.0001637537 0.1138177018 18 0.2807926994 0.2093476438 0.0530959614 0.0000664792 0.4566972163 19 0.5049335568 0.0260158232 0.0000863853 0.4424611335 0.0265031011 110 0.0203117985 0.3800703244 0.2362073342 0.0352103566 0.3282001863 111 0.1901308947 0.5532532962 0.0002811341 0.0002811777 0.2560534974 112 0.3384214744 0.4279357679 0.1207937032 0.1128139993 0.0000350552 113 0.8520671572 0.0001154451 0.0462275639 0.1014744017 0.0001154321 114 0.8163828385 0.1158173532 0.0000841640 0.0213096942 0.0464059501 115 0.4869992267 0.2536825278 0.0000751098 0.0210530230 0.2381901127 1

Page 14: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Whatisa“probabilitydistribu8on”?

v1 v2 v3 v4 v5 RMarginsr1 0.0000576938 0.1088889841 0.1090984138 0.0000576950 0.7818972133 1r2 0.0000476265 0.0000476352 0.9123043912 0.0119956500 0.0756046971 1r3 0.0000881123 0.0179937158 0.0264640660 0.0000881160 0.9553659899 1r4 0.1108000716 0.2503997467 0.5927814401 0.0459855150 0.0000332266 1r5 0.2411895005 0.0404012131 0.6524702741 0.0001041705 0.0658348418 1r6 0.1470047772 0.0002170118 0.6761725881 0.0002169660 0.1763886570 1r7 0.8856909695 0.0001638012 0.0001637739 0.0001637537 0.1138177018 1r8 0.2807926994 0.2093476438 0.0530959614 0.0000664792 0.4566972163 1r9 0.5049335568 0.0260158232 0.0000863853 0.4424611335 0.0265031011 1r10 0.0203117985 0.3800703244 0.2362073342 0.0352103566 0.3282001863 1r11 0.1901308947 0.5532532962 0.0002811341 0.0002811777 0.2560534974 1r12 0.3384214744 0.4279357679 0.1207937032 0.1128139993 0.0000350552 1r13 0.8520671572 0.0001154451 0.0462275639 0.1014744017 0.0001154321 1r14 0.8163828385 0.1158173532 0.0000841640 0.0213096942 0.0464059501 1r15 0.4869992267 0.2536825278 0.0000751098 0.0210530230 0.2381901127 1

Page 15: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Puengittogether•  Ifwehaveaninfinitenumberofrandomsamplees8matesfromthe

samepopula8on,themeanofthisdistribu8onofes8mateswillconvergetothepopula8onmean.

•  Giventhatwecanneverreallyhaveaninfinitenumberofsamples,theasympto8ctheoryofprobabilitydistribu8onssuggeststhat,withlargersamplesizes,wecaninferwithgreaterdegreesofprobabilis8cconfidencewhetherornotoursinglesamplesta8s8caccuratelyreflectstheunknownpopula8onparameter.

•  Asn approachesN,thesamplingerrorgetssmallerandsmaller,meaningthatthereliabilityofoures8mategetsbeherandbeher.Thisisbecause,theore8cally,ifthesamplesize(n)keepsgrowing,itwilleventuallyjustbethepopula8on(N)!

Page 16: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Puengittogether

•  Ofcourse,suchatheoryrequiresthatwemakeassump8onsabouttheshapeoftheunknownpopula8ondistribu8on.

•  Otherwisewedon’tknowwhatnisapproxima8ng!

Page 17: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

CentralLimitTheorem

•  Luckyforus,someveryintelligentsta8s8cianswhocamebeforeusno8cedthat,assamplesizesgrewlargerandlarger,thedistribu8onofsamplemeansbecomesapproximatelynormal—regardlessofwhetherornottheparameteritselfisnormallydistributed.ThisistheCentralLimitTheorem(CLT).

•  Bynormaldistribu.on,wemeanasymmetricdistribu8onwhereapproximatelyhalfofthedatafalltoeithersideofthemean.Itiscommonlyknownasadistribu8onthatfollowsabellcurve.

Page 18: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

CentralLimitTheorem

AccordingtotheCLT,wecanexpectthat,withanormaldistribu8onofrandomsamplemeans,approximately68%ofthesamplemeanswillbewithinonestandarddevia8ononeithersideofthepopula8onmean(μ).95%willbewithintwo,and99.7%withinthree.

*FigurefromMathIsFunwebsite(hhps://www.mathsisfun.com/data/standard-normal-distribu8on.html).

Page 19: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

CentralLimitTheoremNo8cehowthedatabecomemoresymmetricaboutthemeanasthesamplesizeincreases.Assuch,largesamplesizescanserveas“proxies”forrepeatedrandomsamplesandjus8fytheCLT.

0.1

.2.3

.4.5

Density

-1 0 1 2 3n50

0.1

.2.3

.4Density

-4 -2 0 2 4n500

0.1

.2.3

.4Density

-4 -2 0 2 4n5000

0.1

.2.3

.4Density

-4 -2 0 2 4n50000

Page 20: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

StandardError•  So,whatcanwesayaboutpopula8onparametersgivenasample

sta8s8candthesepopula8ondistribu8onassump8ons?•  Forstarters,wecancalculatethestandarddevia8onofthe

theore8caldistribu8onofrandomsamplemeansaroundtheunknownpopula8onmean—alsoknownasthestandarddevia8onofthesamplingdistribu8on.Thisisknownasthestandarderror,andcanbefoundwith:

–  Whereisthestandarddevia8onofthepopula8onparameterandthedenominatoristhesquarerootofthesamplesize.

Page 21: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

StandardError

•  Ofcourse,isusuallynotknown,soweusethesamplestandarddevia8onasanapproxima8on:

•  Ormoresimply:

Page 22: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

StandardError

•  Whatdoesthestandarderrorofthesystolicbloodpressurevariabletellus?Howisthisdifferencefromthestandarddevia8on?

bpsystol 10337 130.8826 .2295796 130.4325 131.3326 Variable Obs Mean Std. Err. [95% Conf. Interval]

. ci bpsystol

bpsystol 10337 130.8826 23.34159 65 300 Variable Obs Mean Std. Dev. Min Max

. sum bpsystol

Page 23: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

StandardError

•  Arandomsamplemeandrawnfromthepopula8on(suchasthisone)likelydiffersfromthepopula8onsystolicbpbyabout0.23mm/Hg.

bpsystol 10337 130.8826 .2295796 130.4325 131.3326 Variable Obs Mean Std. Err. [95% Conf. Interval]

. ci bpsystol

bpsystol 10337 130.8826 23.34159 65 300 Variable Obs Mean Std. Dev. Min Max

. sum bpsystol

Page 24: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

StandardError

•  Thestandarddevia8on,however,ismerelyadescrip8veindica8onofvariabledispersion.Theaveragerespondentinthesampledivergesabout23.34mm/Hg.fromthemean.

bpsystol 10337 130.8826 .2295796 130.4325 131.3326 Variable Obs Mean Std. Err. [95% Conf. Interval]

. ci bpsystol

bpsystol 10337 130.8826 23.34159 65 300 Variable Obs Mean Std. Dev. Min Max

. sum bpsystol

Page 25: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

StandardError

•  Justtocheckthemath:

bpsystol 10337 130.8826 .2295796 130.4325 131.3326 Variable Obs Mean Std. Err. [95% Conf. Interval]

. ci bpsystol

bpsystol 10337 130.8826 23.34159 65 300 Variable Obs Mean Std. Dev. Min Max

. sum bpsystol

Page 26: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

StandardError•  No8cethatgetssmallerwhentwothingshappen:–  (1)Whens,thestandarddevia8on,issmall.–  (2)Whenthesamplesizeislarge.

•  Butalsonotethatsitselfissmallerwhenthesamplesizeislarger.

•  Whennislarge—andthereforeacloserapproxima.onofN—thesamplingdistribu.onvarieslessanditismorelikelythatthesamplemeanrepresentsthepopula.onmean!

Page 27: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

ConfidenceIntervalforMean

•  Wecanalsousethestandarderrorandourknowledgeofthenormaldistribu8ontoconstructaconfidenceintervalaroundthemean—thatis,thebandofvalueswithinwhichthepopula8onmean,µ,islikelytoreside.

Page 28: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

ConfidenceIntervalforMean

•  Theconfidenceintervalcanbefoundwith: or

•  Wherezisourcri.calvalue:i.e.,thenumberofstandarddevia8onsawayfromthemeanthatrepresenttherangeofprobabili8eswithinwhichwethinkthepopula8onmeanresides.

Page 29: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Wait…z-value?What’sthat? •  ThinkbacktowhattheCLTtellsus:– About68%ofsamplemeansfallwithinaboutonestandarddevia8ononeithersideofthepopula8onmean.

– About95%fallwithinabouttwo.– About99.7%fallwithinaboutthree.

•  Wecanusethisinforma8ontofindthestandarddevia8onsthatcorrespondtothedistribu8onpercen8lesthatcapturethesepercentages.

Page 30: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Wait…z value?What’sthat?

•  Forexample,thoughwesaythat95%ofthees8matesfallwithinabouttwostandarddevia8onsofthepopula8onmean,themoreprecisenumberis1.96.Itisourz value!

Thatis,about95%ofthesamplemeansfallwithin±1.96standarddevia.onsofthepopula8onmean.(Nevermindthat0—forourpurposes,thinkofitasµ.)*PhotofromWikipedia(hhps://en.wikipedia.org/wiki/1.96).

Page 31: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

ConfidenceIntervalExample•  Themeanweight(inkilograms)inourNHANESsampleis

71.90.Thestandarddevia8onis15.36,andoursamplesizeis10,337.Withinwhatrangeofkilogramscanwebe95%confidentincludesthepopula8onmean?

.

weight 10337 71.90088 15.35515 30.84 175.88 Variable Obs Mean Std. Dev. Min Max

. sum weight

Page 32: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

ConfidenceIntervalExample•  Let’sstartbyfirstcompu8ngthestandarderror:

.

weight 10337 71.90088 15.35515 30.84 175.88 Variable Obs Mean Std. Dev. Min Max

. sum weight

Page 33: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

ConfidenceIntervalExample•  Wewanttocapturethepopula8onmeanwithinthebandof

valuesthat,accordingtotheCLT,likelyfallwithin±1.96standarddevia8onsfromthepopula8onmean.Assuch:

.

weight 10337 71.90088 15.35515 30.84 175.88 Variable Obs Mean Std. Dev. Min Max

. sum weight

Page 34: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

ConfidenceIntervalExample

•  Thereisa95%chancethattheintervalbetween71.604kg.and72.197kg.containsthemeanpopula8onweight.

.

weight 10337 71.90088 15.35515 30.84 175.88 Variable Obs Mean Std. Dev. Min Max

. sum weight

Page 35: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

ConfidenceIntervalExample

•  ConfirmwithStata:

weight 10337 71.90088 .1510277 71.60484 72.19692 Variable Obs Mean Std. Err. [95% Conf. Interval]

Page 36: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

CIexamplewithdifferentcri8calvalue

•  Whatifwewantedtobe,say,99%confidentthatourintervalcontainsµ?

•  Thecri8calz-valuefora99%confidenceintervalis2.58.Thismeansthat,followingtheCLT,weexpectabout99%ofsamplemeanspulledrandomlyfromoursamplingdistribu8onfallwithin±2.58standarddevia8onsofthepopula8onmean.

Page 37: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

CIexamplewithdifferentcri8calvalue

•  Let’sdothemath:

.

weight 10337 71.90088 15.35515 30.84 175.88 Variable Obs Mean Std. Dev. Min Max

. sum weight

Page 38: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

CIexamplewithdifferentcri8calvalue

•  Wecansaythat,998mesoutof100,wehavecapturedthemeanpopula8onweightwiththeintervalbetween71.51kg.and72.29kg.

.

weight 10337 71.90088 15.35515 30.84 175.88 Variable Obs Mean Std. Dev. Min Max

. sum weight

Page 39: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

CIexamplewithdifferentcri8calvalue

•  ConfirmwithStata:

.

weight 10337 71.90088 .1510277 71.51179 72.28997 Variable Obs Mean Std. Err. [99% Conf. Interval]

Page 40: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

ConfidenceIntervalPrecision

•  Notethattheconfidenceintervalgetsbiggerwhenwegofrom95%to99%confidence.– For95%CI:72.197–71.605=.592– For99%CI:72.290–71.512=.778

•  Thisisbecausewehavetohavelessprecisionwhenwetrytobemoreconfident!

Page 41: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

z-scores •  Recallthatcri8calz-values(e.g.,±1.96and±2.58)arethe

standarddevia8onsawayfromµthatwewouldexpecttocapture95%and99%ofrandomsamplemeans(respec8vely)inanormalsamplingdistribu8on.

•  Theore8cally,thez-valueforanygivencasecanbecalculatedwith.Thisvaluewouldtellushowmanystandarddevia8onsthecaseisfromµ.

•  Wecanapplythissamelogictoanindividualvariabletoquan8fyhowfaraspecificcaseisfromthevariablemean.Thismeasureiscalledaz-score,orastandardizedscore.

Page 42: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

z-scores •  Thez-scoreforanindividualcasecanbefoundbysubtrac8ngthevariablemeanfromtherawscoreandthendividingthedifferencebythevariablestandarddevia8on:

– Where,asbefore,isthemeanforthevariableands isthevariablestandarddevia8on.

Page 43: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

z-scoreexample •  Belowisthedistribu8onofsystolicbloodpressurereadingsforthe

NHANESsample.Themeanis130.88mm/Hg.Thegreenbarisapar8cularvalueofthevariable:110mm/Hg.Thestandarddevia8onis23.34mm/Hg.Whatisthez-scoreforthiscase,andwhatdoesthisnumbermean?

050

010

00Fr

eque

ncy

50 70 90 110 130 150 170 190 210 230 250 270 290systolic blood pressure

Page 44: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

z-scoreexample

050

010

00Fr

eque

ncy

50 70 90 110 130 150 170 190 210 230 250 270 290systolic blood pressure

Page 45: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

z-scoreexample

050

010

00Fr

eque

ncy

50 70 90 110 130 150 170 190 210 230 250 270 290systolic blood pressure

•  Acasewithasystolicbloodpressurereadingof110mm/Hg.isalihlelessthan1standarddevia8onbelowthemean.

Page 46: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

z-scoreexample •  ConfirmingwithStata.Notethatthesamebariscolored

green.That’sbecausetheyarethesamecases!

050

010

00Fr

eque

ncy

50 70 90 110 130 150 170 190 210 230 250 270 290systolic blood pressure

050

010

00Fr

eque

ncy

-2 0 2 4 6 8zsystol

Page 47: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Conclusion•  Wehaveseenhowtheasympto8ctheoryofprobabilitydistribu8onsallowsustoassesshowwelloursinglesamplemeanrepresentsthetruepopula8onmeanintheabsenceofrepeatedrandomsamples.

•  ItdoesthisbyfollowingtheCLT.Thisallowsustouseoursamplesizetoes8matehowwellhypothe8calrandomsamples(ofthesamesize)wouldapproximateanormaldistribu8onandthereforeapproximatethepopula8onmean.

Page 48: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

Conclusion

•  Thoughstandarderrorsandconfidenceintervalshelpusgetanideaofwherethepopula8onmeanmaybe,howdoweknowthesenumbersarereliable?Thatis,howdoweknowthatoures8matesaren’tjusttheproductofsamplingerror?

•  Thisisthejobofsta.s.calinference—anditisthetopicforthenextsession!

Page 49: Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor: Marshall A. Taylor 844 Flanner Hall mtaylo15@nd.edu marshalltaylor.net Recap •

DatasetsUsed

•  TheStatasurveydocumenta8ondata,nhanes2f,fromtheStataPresswebsite.RetrievedJuly24,2016(hhp://www.stata-press.com/data/r11/svy.html).