Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
SamplingandSummaryStatisticsBiology683
Lecture2
HeathBlackmon
Lastweek
• Whataresomecausesofthereproducibilitycrisis?
• Commonproblemsinplots?
Today
1. Terminology2. SummarizingData3. CentralLimitTheorem4. CovarianceandCorrelation
PopulationsandSamples
• PopulationsSomesortofgroupofsomething- couldbeanything• UndergraduatesatTexasA&M• JewelbeetlesinArizona• Strainoffliesinthelab
• Samples• Asubsetofindividualsdrawnfromapopulation
Whatisthepopulation?
Wewantedtoexamineanyassociationbetweentheseverityofinjuries,andtheheightfromwhichcatsfallinhigh-risebuildings.
IntheperiodbetweenJanuary1,1998andDecember12,2001attheClinicofSurgery,OrthopedicsandOphthalmologyoftheVeterinaryFaculty,119catsweretreatedafterafallorjumpfromabalconyorwindow,wheretheownerssawthefall,orwheretherewasareasonablesuspicionthatafallhadoccurred.Onlythosecatsthatfellfromthesecondorhigherstorieswereincluded.Theownersbroughtthecatsfortreatmentwithinvaryingperiodsoftimeafterthefall(from30mintooveramonth).
Vnuk,etal."Felinehigh-risesyndrome:119cases(1998–2001). JournalofFelineMedicine&Surgery 6.5(2004):305-312.
SamplingConsiderations
Targetpopulation• Needtosamplearepresentativepopulation• AsampleofpeoplefromCollegeStation,forinstance,would
probablynotberepresentativeofNewYorkers
SamplingError• Chancealonewillcauseyoursampletodepartfromthe
population
Parameter,estimates,samplingconsiderations
Parameter:Population-levelvariableswearetryingtoestimate
EstimateorStatistic:Thevalueoftheparameterinferredfromthesample
Bias:Ifsomethingaboutthesamplingprocedurecausesthesampletosystematicallymisrepresentthepopulation.
Precision:Howtightlygroupedaretheestimates?
AccuracyvsPrecision
•Precision is a measure of spread
•Accuracy is a measure of bias
RandomSampling
1. Every unit in a population should have an equal chance of being sampled.
2. The selection of units must be independent.
3. Lots of ways of being non-random…
Yourbigideashouldbeahypothesis
Astatisticalhypothesisisaspecificclaimaboutapopulationparameter
CaloricrestrictionincreasesthelifespanofDrosophilamelanogaster.
Therateofevolutioninwinglessspeciesishigherthanwingedspecies.
Pesticideexposurecausesfeminizationofamphibianmales.
RepetitiveDNAcontentishigherinvenomousthannonvenomousreptiles
Data
VariablesThecharacteristicsthatdifferamongindividuals
DataThemeasurementsofvariablestakenforasampleofindividuals
CategoricalVariablesIndividualsareinqualitativecategories
Data
NumericalVariablesIndividualsvaryonaquantitativescale
OrdinalThecategoriescanbeordered
NominalThecategorieshavenoinherentorder
ContinuousvsDiscrete
Continuousvariablesavariablethathasaninfinitenumberofpossiblevalues
Discretevariablesavariablethathasafinitenumberofpossiblevalues
ExplanatoryandResponseVariables
Experimentalvsobservationalstudies
• Doescaloricrestrictionincreaselifespaninmice?• Isglobalwarmingcausedbyhumanactivities?• Doessmokingcauselungcancerinhumans?• Doesparasiteinfectionreducematingsuccessofbeetles?• Doesoxytocinaffectsexualattractioninhumans?• Dosexchromosomesincreasetherateofspeciation?
Whyshouldwesummarizedata?
• Manydatasetsaresimplytoobigtolookatallvaluesandformanimpression?
• Ourimpressionsofsmalldatasetsareoftenmisledbyourtendencytolookforpatterns.
Typicalsummarystatistics
• Mean: Sumoftheobservationsdividedbythenumberofobservations
• Median: Themiddleobservationinasetofdata
• Variance: Theaveragesquareddeviationfromthemean
• StandardDeviation: Thesquarerootofthevariance
Meanandvariance
BoxPlot
InterquartileRange
BoxPlot
Rangeofalldata
BoxPlot
Datapointsmorethan1.5Interquartilerangesawayfromthemedian
BoxPlot
Symbolsforsamplesandpopulations
SamplesversusPopulationsThemeanorstandarddeviationstatisticyoucalculatefromyoursampleisanestimateofthepopulationparameter.
ParameterSymbols:μ :populationmeanσ :populationstandarddeviation
StatisticSymbols:𝑌" :samplemeans :samplesstandarddeviation
Forasampleofapopulation
Themeanisjust:𝑌" = ∑ %&'&()*
Thestandarddeviationis𝑠 = 𝑠,�
Where𝑠, orthevarianceis:𝑠, = ∑ %&/%" 0'&()*/1
Centrallimittheorem
• Imaginethatwesamplefromthesamepopulationmanytimes,sowehaveabunchofdifferent,independentsamples.
• Eachsamplewillhaveamean,butthemeanswillbedifferentduetochance Inprinciple,wecoulddrawahistogramofthesemeans.
• Ingeneral,youonlyhaveonesamplefromagivenpopulation,however,sowhatcanyouinferaboutthedistributionofthemeansfromyoursample?
• TheCentralLimitTheoremstatesthatregardlessoftheunderlyingpopulationdistributionofthevariableofinterest,thedistributionofthepopulationofmeanswillberoughlynormal.
Centrallimittheorem
Yourestimateofthesamplemeanisanestimateofthemeanofthisdistributionofmeans(thatis,it’syourbestestimateofthepopulationmean).
Thehypotheticaldistributionofsamplemeanshasastandarddeviationequaltosdividedbythesquarerootofn.
𝑆𝐸%" =4*�
Wecallthisstandarddeviationthestandarderrorofthemean(SEM).
Centrallimittheorem
Letstrythat
Estimatingwithuncertainty
Errorbars
• Errorbarscanbeausefulwaytoshowuncertaintywhenit’snotpossibletoshowtheactualdatapoints.
• Usually,theyrepresent1SEorthe95%CI,butnotalways.
• THEFIGURELEGENDSHOULDINDICATEWHATTHEERRORBARSREPRESENT!
Yourturn
• Letsdemonstratethatthemeansofsamplesfromanexponentialdistributionarenormallydistributed.
Youwillneed:rexp, hist, mean
Mightuse:for, sample
BoxPlot