Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor:...

Day2,Morning:TheLogicofDistribu8ons

Instructor:MarshallA.Taylor844FlannerHallmtaylo15@nd.edumarshalltaylor.net

•  Yesterdaymorningwetalkedabout:– Thedescrip8veandinferen8alpurposesofsta8s8cs

– Thedifferencebetweensamples,popula8ons,andtheissuesthatarisebecauseofsamplingerrorandsamplebias—andthewaysinwhichprobabilitytheorycanbeusedtoaddresstheformer.

– Thebasicsofprobabilitytheory

•  Yesterdayeveningwetalkedabout:– Thedifferencebetweenunivariate,bivariate,andmul8variatesta8s8cs.

– Whatavariableisandthegeneralformsthatitcantake:nominal,ordinal,orcon8nuous.

– Measuresofcentraltendencyforeachtypeofvariable.

– Measuresofdispersion.

GamePlanforToday•  Morning– Wewillbringthesepreviouslecturestogethertoshowhowwecanuseprobabilitytheorytoassesshowrepresenta8veoursampledistribu.onisofthepopula.ondistribu.on.

•  Evening– AReroutliningtheasympto.ctheoryofprobabilitydistribu.onsinthemorning,wewillthenexaminebasicunivariatetestsofsta.s.calinferencetoquan8fyhowwelloursamplesta8s8csapproximatepopula8onparametersnetofsamplingerror.

Whatisadistribu8on?

Insta8s8cs,adistribu.onissimplythearrayofvaluesforoneormorevariablesacrossasetofunits(people,groups,etc.).

Thedistribu8oniseverything

•  Theconceptof“distribu8on”hasbeenattheimplicitcenterofabsolutelyeverythingwehavetalkedaboutsofar!

•  Wehavespecificallylookedatsamplesta8s8cdistribu8ons,suchas…

FrequencyDistribu8ons

Total 10,335 100.00 excellent 2,407 23.29 100.00 good 2,591 25.07 76.71 average 2,938 28.43 51.64 fair 1,670 16.16 23.21 poor 729 7.05 7.05 5=excellent Freq. Percent Cum.1=poor,...,

. tab health

Frequencydistribu8onsas“histograms”

50 100 150 200 250 300systolic blood pressure

Bimodaldistribu8onwithcategoricalvariables

Male Female

Solid R or D Likely R or DLeaning R or D Toss-Up

Bimodaldistribu8onwithcon8nuousvariables

20 40 60 80age in years

Notethetwopeaks.

Distribu8onsandProbabilityTheory

•  Distribu8onsservemuchmorethanadescrip8vepurpose.

•  Werelyontheasympto.ctheoryofprobabilitydistribu.onstomakesta.s.calinferences.

•  Suchatheorygivesusdependableideasaboutwhatthesamplingdistribu.onwilllooklike.

Whatis“asympto8c”?

•  “Asympto8c”referstothepropertythat,ifsampledaninfinitenumberof8mes,asta8s8cwillconvergetothepopula8onparameteritismeanttoapproximate.

•  Thatis,as,assumingthatallnarerandomsamplesofN.

Whatisa“probabilitydistribu8on”?

•  A“probabilitydistribu8on”isanarrayofprobabilis8cvaluesforavariableacrossasetofunits,wherethevaluesarepropor8onsthatmustsumto1.

V1 V2 V3 V4 V5 RMargins1 0.0000576938 0.1088889841 0.1090984138 0.0000576950 0.7818972133 12 0.0000476265 0.0000476352 0.9123043912 0.0119956500 0.0756046971 13 0.0000881123 0.0179937158 0.0264640660 0.0000881160 0.9553659899 14 0.1108000716 0.2503997467 0.5927814401 0.0459855150 0.0000332266 15 0.2411895005 0.0404012131 0.6524702741 0.0001041705 0.0658348418 16 0.1470047772 0.0002170118 0.6761725881 0.0002169660 0.1763886570 17 0.8856909695 0.0001638012 0.0001637739 0.0001637537 0.1138177018 18 0.2807926994 0.2093476438 0.0530959614 0.0000664792 0.4566972163 19 0.5049335568 0.0260158232 0.0000863853 0.4424611335 0.0265031011 110 0.0203117985 0.3800703244 0.2362073342 0.0352103566 0.3282001863 111 0.1901308947 0.5532532962 0.0002811341 0.0002811777 0.2560534974 112 0.3384214744 0.4279357679 0.1207937032 0.1128139993 0.0000350552 113 0.8520671572 0.0001154451 0.0462275639 0.1014744017 0.0001154321 114 0.8163828385 0.1158173532 0.0000841640 0.0213096942 0.0464059501 115 0.4869992267 0.2536825278 0.0000751098 0.0210530230 0.2381901127 1

Whatisa“probabilitydistribu8on”?

v1 v2 v3 v4 v5 RMarginsr1 0.0000576938 0.1088889841 0.1090984138 0.0000576950 0.7818972133 1r2 0.0000476265 0.0000476352 0.9123043912 0.0119956500 0.0756046971 1r3 0.0000881123 0.0179937158 0.0264640660 0.0000881160 0.9553659899 1r4 0.1108000716 0.2503997467 0.5927814401 0.0459855150 0.0000332266 1r5 0.2411895005 0.0404012131 0.6524702741 0.0001041705 0.0658348418 1r6 0.1470047772 0.0002170118 0.6761725881 0.0002169660 0.1763886570 1r7 0.8856909695 0.0001638012 0.0001637739 0.0001637537 0.1138177018 1r8 0.2807926994 0.2093476438 0.0530959614 0.0000664792 0.4566972163 1r9 0.5049335568 0.0260158232 0.0000863853 0.4424611335 0.0265031011 1r10 0.0203117985 0.3800703244 0.2362073342 0.0352103566 0.3282001863 1r11 0.1901308947 0.5532532962 0.0002811341 0.0002811777 0.2560534974 1r12 0.3384214744 0.4279357679 0.1207937032 0.1128139993 0.0000350552 1r13 0.8520671572 0.0001154451 0.0462275639 0.1014744017 0.0001154321 1r14 0.8163828385 0.1158173532 0.0000841640 0.0213096942 0.0464059501 1r15 0.4869992267 0.2536825278 0.0000751098 0.0210530230 0.2381901127 1

Puengittogether•  Ifwehaveaninfinitenumberofrandomsamplees8matesfromthe

samepopula8on,themeanofthisdistribu8onofes8mateswillconvergetothepopula8onmean.

•  Giventhatwecanneverreallyhaveaninfinitenumberofsamples,theasympto8ctheoryofprobabilitydistribu8onssuggeststhat,withlargersamplesizes,wecaninferwithgreaterdegreesofprobabilis8cconfidencewhetherornotoursinglesamplesta8s8caccuratelyreflectstheunknownpopula8onparameter.

•  Asn approachesN,thesamplingerrorgetssmallerandsmaller,meaningthatthereliabilityofoures8mategetsbeherandbeher.Thisisbecause,theore8cally,ifthesamplesize(n)keepsgrowing,itwilleventuallyjustbethepopula8on(N)!

Puengittogether

•  Ofcourse,suchatheoryrequiresthatwemakeassump8onsabouttheshapeoftheunknownpopula8ondistribu8on.

•  Otherwisewedon’tknowwhatnisapproxima8ng!

CentralLimitTheorem

•  Luckyforus,someveryintelligentsta8s8cianswhocamebeforeusno8cedthat,assamplesizesgrewlargerandlarger,thedistribu8onofsamplemeansbecomesapproximatelynormal—regardlessofwhetherornottheparameteritselfisnormallydistributed.ThisistheCentralLimitTheorem(CLT).

•  Bynormaldistribu.on,wemeanasymmetricdistribu8onwhereapproximatelyhalfofthedatafalltoeithersideofthemean.Itiscommonlyknownasadistribu8onthatfollowsabellcurve.

CentralLimitTheorem

AccordingtotheCLT,wecanexpectthat,withanormaldistribu8onofrandomsamplemeans,approximately68%ofthesamplemeanswillbewithinonestandarddevia8ononeithersideofthepopula8onmean(μ).95%willbewithintwo,and99.7%withinthree.

*FigurefromMathIsFunwebsite(hhps://www.mathsisfun.com/data/standard-normal-distribu8on.html).

CentralLimitTheoremNo8cehowthedatabecomemoresymmetricaboutthemeanasthesamplesizeincreases.Assuch,largesamplesizescanserveas“proxies”forrepeatedrandomsamplesandjus8fytheCLT.

Density

-1 0 1 2 3n50

.4Density

-4 -2 0 2 4n500

.4Density

-4 -2 0 2 4n5000

.4Density

-4 -2 0 2 4n50000

StandardError•  So,whatcanwesayaboutpopula8onparametersgivenasample

sta8s8candthesepopula8ondistribu8onassump8ons?•  Forstarters,wecancalculatethestandarddevia8onofthe

theore8caldistribu8onofrandomsamplemeansaroundtheunknownpopula8onmean—alsoknownasthestandarddevia8onofthesamplingdistribu8on.Thisisknownasthestandarderror,andcanbefoundwith:

–  Whereisthestandarddevia8onofthepopula8onparameterandthedenominatoristhesquarerootofthesamplesize.

StandardError

•  Ofcourse,isusuallynotknown,soweusethesamplestandarddevia8onasanapproxima8on:

•  Ormoresimply:

StandardError

•  Whatdoesthestandarderrorofthesystolicbloodpressurevariabletellus?Howisthisdifferencefromthestandarddevia8on?

bpsystol 10337 130.8826 .2295796 130.4325 131.3326 Variable Obs Mean Std. Err. [95% Conf. Interval]

. ci bpsystol

bpsystol 10337 130.8826 23.34159 65 300 Variable Obs Mean Std. Dev. Min Max

. sum bpsystol

StandardError

•  Arandomsamplemeandrawnfromthepopula8on(suchasthisone)likelydiffersfromthepopula8onsystolicbpbyabout0.23mm/Hg.

. ci bpsystol

. sum bpsystol

StandardError

•  Thestandarddevia8on,however,ismerelyadescrip8veindica8onofvariabledispersion.Theaveragerespondentinthesampledivergesabout23.34mm/Hg.fromthemean.

. ci bpsystol

. sum bpsystol

StandardError

•  Justtocheckthemath:

. ci bpsystol

. sum bpsystol

StandardError•  No8cethatgetssmallerwhentwothingshappen:–  (1)Whens,thestandarddevia8on,issmall.–  (2)Whenthesamplesizeislarge.

•  Butalsonotethatsitselfissmallerwhenthesamplesizeislarger.

•  Whennislarge—andthereforeacloserapproxima.onofN—thesamplingdistribu.onvarieslessanditismorelikelythatthesamplemeanrepresentsthepopula.onmean!

ConfidenceIntervalforMean

•  Wecanalsousethestandarderrorandourknowledgeofthenormaldistribu8ontoconstructaconfidenceintervalaroundthemean—thatis,thebandofvalueswithinwhichthepopula8onmean,µ,islikelytoreside.

ConfidenceIntervalforMean

•  Theconfidenceintervalcanbefoundwith: or

•  Wherezisourcri.calvalue:i.e.,thenumberofstandarddevia8onsawayfromthemeanthatrepresenttherangeofprobabili8eswithinwhichwethinkthepopula8onmeanresides.

Wait…z-value?What’sthat? •  ThinkbacktowhattheCLTtellsus:– About68%ofsamplemeansfallwithinaboutonestandarddevia8ononeithersideofthepopula8onmean.

– About95%fallwithinabouttwo.– About99.7%fallwithinaboutthree.

•  Wecanusethisinforma8ontofindthestandarddevia8onsthatcorrespondtothedistribu8onpercen8lesthatcapturethesepercentages.

Wait…z value?What’sthat?

•  Forexample,thoughwesaythat95%ofthees8matesfallwithinabouttwostandarddevia8onsofthepopula8onmean,themoreprecisenumberis1.96.Itisourz value!

Thatis,about95%ofthesamplemeansfallwithin±1.96standarddevia.onsofthepopula8onmean.(Nevermindthat0—forourpurposes,thinkofitasµ.)*PhotofromWikipedia(hhps://en.wikipedia.org/wiki/1.96).

ConfidenceIntervalExample•  Themeanweight(inkilograms)inourNHANESsampleis

71.90.Thestandarddevia8onis15.36,andoursamplesizeis10,337.Withinwhatrangeofkilogramscanwebe95%confidentincludesthepopula8onmean?

weight 10337 71.90088 15.35515 30.84 175.88 Variable Obs Mean Std. Dev. Min Max

. sum weight

ConfidenceIntervalExample•  Let’sstartbyfirstcompu8ngthestandarderror:

. sum weight

ConfidenceIntervalExample•  Wewanttocapturethepopula8onmeanwithinthebandof

valuesthat,accordingtotheCLT,likelyfallwithin±1.96standarddevia8onsfromthepopula8onmean.Assuch:

. sum weight

ConfidenceIntervalExample

•  Thereisa95%chancethattheintervalbetween71.604kg.and72.197kg.containsthemeanpopula8onweight.

. sum weight

ConfidenceIntervalExample

•  ConfirmwithStata:

weight 10337 71.90088 .1510277 71.60484 72.19692 Variable Obs Mean Std. Err. [95% Conf. Interval]

CIexamplewithdifferentcri8calvalue

•  Whatifwewantedtobe,say,99%confidentthatourintervalcontainsµ?

•  Thecri8calz-valuefora99%confidenceintervalis2.58.Thismeansthat,followingtheCLT,weexpectabout99%ofsamplemeanspulledrandomlyfromoursamplingdistribu8onfallwithin±2.58standarddevia8onsofthepopula8onmean.

•  Let’sdothemath:

. sum weight

•  Wecansaythat,998mesoutof100,wehavecapturedthemeanpopula8onweightwiththeintervalbetween71.51kg.and72.29kg.

. sum weight

•  ConfirmwithStata:

weight 10337 71.90088 .1510277 71.51179 72.28997 Variable Obs Mean Std. Err. [99% Conf. Interval]

ConfidenceIntervalPrecision

•  Notethattheconfidenceintervalgetsbiggerwhenwegofrom95%to99%confidence.– For95%CI:72.197–71.605=.592– For99%CI:72.290–71.512=.778

•  Thisisbecausewehavetohavelessprecisionwhenwetrytobemoreconfident!

z-scores •  Recallthatcri8calz-values(e.g.,±1.96and±2.58)arethe

standarddevia8onsawayfromµthatwewouldexpecttocapture95%and99%ofrandomsamplemeans(respec8vely)inanormalsamplingdistribu8on.

•  Theore8cally,thez-valueforanygivencasecanbecalculatedwith.Thisvaluewouldtellushowmanystandarddevia8onsthecaseisfromµ.

•  Wecanapplythissamelogictoanindividualvariabletoquan8fyhowfaraspecificcaseisfromthevariablemean.Thismeasureiscalledaz-score,orastandardizedscore.

z-scores •  Thez-scoreforanindividualcasecanbefoundbysubtrac8ngthevariablemeanfromtherawscoreandthendividingthedifferencebythevariablestandarddevia8on:

– Where,asbefore,isthemeanforthevariableands isthevariablestandarddevia8on.

z-scoreexample •  Belowisthedistribu8onofsystolicbloodpressurereadingsforthe

NHANESsample.Themeanis130.88mm/Hg.Thegreenbarisapar8cularvalueofthevariable:110mm/Hg.Thestandarddevia8onis23.34mm/Hg.Whatisthez-scoreforthiscase,andwhatdoesthisnumbermean?

50 70 90 110 130 150 170 190 210 230 250 270 290systolic blood pressure

z-scoreexample

•  Acasewithasystolicbloodpressurereadingof110mm/Hg.isalihlelessthan1standarddevia8onbelowthemean.

z-scoreexample •  ConfirmingwithStata.Notethatthesamebariscolored

green.That’sbecausetheyarethesamecases!

-2 0 2 4 6 8zsystol

Conclusion•  Wehaveseenhowtheasympto8ctheoryofprobabilitydistribu8onsallowsustoassesshowwelloursinglesamplemeanrepresentsthetruepopula8onmeanintheabsenceofrepeatedrandomsamples.

•  ItdoesthisbyfollowingtheCLT.Thisallowsustouseoursamplesizetoes8matehowwellhypothe8calrandomsamples(ofthesamesize)wouldapproximateanormaldistribu8onandthereforeapproximatethepopula8onmean.

Conclusion

•  Thoughstandarderrorsandconfidenceintervalshelpusgetanideaofwherethepopula8onmeanmaybe,howdoweknowthesenumbersarereliable?Thatis,howdoweknowthatoures8matesaren’tjusttheproductofsamplingerror?

•  Thisisthejobofsta.s.calinference—anditisthetopicforthenextsession!

DatasetsUsed

•  TheStatasurveydocumenta8ondata,nhanes2f,fromtheStataPresswebsite.RetrievedJuly24,2016(hhp://www.stata-press.com/data/r11/svy.html).

Day 2, Morning: The Logic of Distribu8ons · Day 2, Morning: The Logic of Distribu8ons Instructor:...

Documents

Combining Description Logic, Autoepistemic Logic and Logic Programming

Logic Synthesis Outline –Logic Synthesis Problem –Logic Specification –Two-Level Logic Optimization Goal –Understand logic synthesis problem –Understand

Basic Logic Circuits Complete Logic Family Other Logic Styles

INFORMAL LOGIC AND LOGIC

Fixpoint Logic vs. Infinitary Logic in Finite-Model Theoryvardi/papers/lics92rj.pdf · 2015-01-14 · Fixpoint Logic vs. Infinitary Logic ... fixpoint logic, and first-order logic

Prop Logic 2-Logic

CYNTHIA KEPPLEY MAHMOOD · CYNTHIA KEPPLEY MAHMOOD Department of Anthropology Office: 639 Flanner Hall 611 Flanner Hall (574) 631-4744 University of Notre Dame cmahmood@nd.edu

COURSE TERM ROF PHONE FFICE OFFICE HRS Flanner 637) MAIL ... · • Nature BBC News • Science • Primates • Hereditas • Human Biology • Current Anthropology • American

Morning Morning brifing

Brand List · • H2 • Morning-CHCH Morning Live • HGTV • Morning-CP24 Breakfast • History Television • Morning-CTV Morning Live • Investigation Discovery • Morning-Global

Logic, Logic, Logic

Logic Simulation Outline –Logic Simulation –Logic Design Description –Logic Models Goal –Understand logic simulation problem –Understand logic models

Black Carbon in Snow: Role in Arctic Climate, Uncertainties, and Future Scenarios Mark Flanner 1 Charlie Zender 2 Collaborators: Tami Bond, Jim Randerson,

Logic Puzzles and Modal Logic. Closure properties in modal logic

MORNING, MORNING, MORNING!!

Morning Meetings. Morning Meeting Greeting Sharing Group Activity Morning Message

Logic in Hegel’s Logic

Today’s Topics Digital Logic Design Digital Logic Design Boolean Logic Boolean Logic Digital Logic Circuits Digital Logic Circuits

CAROLYN REBECCA NORDSTROM - University of …anthropology.nd.edu/assets/122499/nordstromcv2014.pdfCarolyn Nordstrom 1 CAROLYN REBECCA NORDSTROM Department of Anthropology 623 Flanner

Good morning, morning