90
Probability Distributions and Introduction to Statistical Inference BIO5312 FALL2017 STEPHANIE J. SPIELMAN, PHD

Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Embed Size (px)

Citation preview

Page 1: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

ProbabilityDistributionsandIntroductiontoStatisticalInferenceBIO5312FALL2017

STEPHANIE J. SPIELMAN,PHD

Page 2: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

RandomvariableRandomprocessesproducenumericaloutcomes:◦ Numberoftailsin50coinflips◦ Thesumofeveryone'sheights

Definition:arandomvariableisafunctionthatmapsoutcomesofarandomprocesstoanumericvalue◦ X isafunction(rule)thatassignanumberX(s) toeachoutcomes ∈S (wheres isaneventinsamplespace S )

◦ r.v.'s aretechnicallyneitherrandomnorvariables…◦ But,youcanthinkofthemroughlynumericaloutcomesofrandomprocesses

Page 3: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

DiscretevscontinuousRVDiscrete randomvariablescantakeon(mapto)afinitenumberofvalues

Continuousrandomvariablescantakeon(mapto)innumerable/infinitevalues

Page 4: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

ExpressingdiscreterandomvariablesProbabilitymassfunction(PMF)◦ Describesthevaluestakenbyadiscreter.v.X anditsassociatedprobabilities◦ Functionthatassigns,toanypossiblevaluex ofadiscreter.v.X,theprobabilityP(X = x)

0.00

0.05

0.10

0.15

1 2 3 4 5 6Event

Even

t pro

babi

lity

PMFforrollingafairdieP(X = x)

x

Page 5: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PMFproperties0 ≤ 𝑃 𝑋 = 𝑥 ≤ 1

∑𝑃 𝑋 = 𝑥 = 1��

PMFissimplyafanciertermforadiscreteprobabilitydistribution

Page 6: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

ExpressingdiscreterandomvariablesCumulativedistributionfunction(CDF)◦ Functiondefined,foraspecificvaluex ofadiscreter.v.X,asF(x) = P(X ≤ x)

0.00

0.25

0.50

0.75

1.00

1 2 3 4 5 6Event

Cum

ulat

ive p

roba

bilit

y

CDFforrollingafairdie

P(X ≤ 1)

P(X ≤ 4)

Page 7: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

CDFproperties0 ≤ 𝐹 𝑋 ≤ 1

CDFfunctionsarenon-decreasing

Page 8: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PMFvsCDFPMF:WhatistheprobabilityofeventX?

CDF:Whatisthesumofprobabilitiesforallevents≤X?

Page 9: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

ExpectationandspreadofrandomvariablesTheexpectation ofar.v.istheprobability-weightedaverageofallpossiblevalues(i.e.,mean)◦𝔼 𝑋 = 𝜇 = ∑ 𝑥/𝑝(𝑥/)�

/

Thevarianceofar.v.isdefined◦𝑉𝑎𝑟 𝑋 = 𝜎7 = 𝔼[ 𝑋 − 𝜇 7] = ∑ [𝑥/7𝑝(𝑥/)�

/ ] − 𝜇7

◦𝑉𝑎𝑟 𝑋 = 𝔼[𝑋7] − 𝔼[𝑋]7

Page 10: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Example:TheBinomialdistributionThebinomialdistribution describestheprobabilityofobtainingk successesinnBernoullitrials,wheretheprobabilityofsuccessforeachtrialisconstantatp

ABernoullitrial hasabinaryoutcome(success/fail,true/false,yes/no),andP(success)=p isthesameforallrealizationsofthetrial

Page 11: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

TheBInS conditionsTobebinomiallydistributed,mustsatisfythefollowing:

Binaryoutcomes

Independenttrials(outcomesdonotinfluenceeachother)

n isfixedbeforethetrialsbegin

Sameprobabilityofsuccess,p,foralltrials

Page 12: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Isitbinomial?

Yes!

NoL

Abagcontains10balls,7redand3green.

Situation1:Youdraw5ballsfromthebag,notingtheballcoloreachtimeandthenreturningittothebag.

Situation2:Youdraw5ballsfromthebag,retainingeachdrawnballforsafe-keepingsoyoucanplaycatchatanymoment.

Situation3:Youkeepdrawingballs,withreplacement,untilyouhavedrawn4redballs. NoL

Page 13: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

ThebinomialdistributionThePMF(probabilitydistribution)forabinomially-distributedrandomvariable:

𝑃 𝑋 = 𝑘 = <= 𝑝

=(1 − 𝑝)(<>=)= <= 𝑝

=𝑞(<>=)

Thebinomialcoefficient: <= = <!=! <>= !

◦ readas"nchoosek"

Page 14: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Wikipediaweighsin

Page 15: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

ThebinomialdistributionTheexpectation forabinomialr.v.◦ 𝔼 𝑋 = 𝜇 = np

Thevarianceforabinomialr.v.◦ 𝑉𝑎𝑟 𝑋 = 𝜎7 = npq = np(1 − p)

Wewritebinomiallydistributedr.v.'s as𝑋~𝐵(𝑛, 𝑝)

Page 16: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Example:PlayingwithabinomialrvEachchildborntoaparticularsetofparentshasa25%probabilityofhavingbloodtypeO.Assumetheparentshadfivechildren.

Here,n=5andp=0.25,meaningwedefineTypeOas"success",andnotTypeOas"failure".à X~B(5,0.25)

Tasks:◦ Computeexpectationandvariance◦ VisualizePMF◦ VisualizeCDF◦ Makesomecalculations…

Page 17: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

ExpectationandvarianceEachchildborntoaparticularsetofparentshasa25%probabilityofhavingbloodtypeO.Assumetheparentshadfivechildren.B(5,0.25)

𝔼 𝑋 = 𝜇 = np =5*0.25=1.25

𝑉𝑎𝑟 𝑋 = 𝜎7 = npq = np(1 − p) =5*0.25*0.75=0.9375

Page 18: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

VisualizethePMF

0.2373046875

0.3955078125

0.263671875

0.087890625

0.01464843750.00097656250.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5Number of kids

Prob

abilit

y Ty

pe O

Page 19: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

?distributionsDistributions in the stats package

Description:Density, cumulative distribution function, quantile function andrandom variate generation for many standard probabilitydistributions are available in the ‘stats’ package.

Details:The functions for the density/mass function, cumulativedistribution function, quantile function and random variategeneration are named in the form ‘dxxx’, ‘pxxx’, ‘qxxx’ and ‘rxxx’respectively.

For the beta distribution see ‘dbeta’.

For the binomial (including Bernoulli) distribution see ‘dbinom’.

For the Cauchy distribution see ‘dcauchy’.

For the chi-squared distribution see ‘dchisq’.

Page 20: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Distributionfunctions,generallyFunction Purpose Binomialversion

dxxx() Probabilitydistribution dbinom(x, size, prob)

pxxx() CDF pbinom(q, size, prob)rxxx() Generaterandom

numbersfromgivendistribution

rbinom(n, size, prob)

qxxx() Quantile:Inverse ofpxxx()

qbinom(p, size, prob)

Page 21: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

BinomialdistributionfunctionsBinomial function Example Output

dbinom(x, size, prob) dbinom(2, 5, 0.25) Prob ofobtaining2successesin5trials,wherep=0.25à 0.263

pbinom(q, size, prob) pbinom(2, 5, 0.25) Prob ofobtaining≤2successesin5trials,wherep=0.25à 0.896

rbinom(n, size, prob) rbinom(100, 5, 0.25) Generate100kvaluesfromthisbinomialdist.à 100from{0,1,2,3,4}

qbinom(p, size, prob) qbinom(0.896, 5, 0.25) SmallestvaluexwhereF(x)>=p*à 2*notprob success,justaprob

Page 22: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

MakingthePMF0.2373046875

0.3955078125

0.263671875

0.087890625

0.01464843750.00097656250.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5Number of kids

Prob

abilit

y Ty

pe O

> ## Use dbinom() to get the PMF values> p = 0.25> n = 5> k0 <- dbinom(0, 5, 0.25) ## Prob of 0 successes, aka no children are Type O> k1 <- dbinom(1, 5, 0.25) ## Prob of 1 success, aka only 1 child is Type O

> ## Advanced:> library(purrr)> map_dbl(0:5, dbinom, 5, 0.25)

[1] 0.2373046875 0.3955078125 0.2636718750 0.0878906250 0.0146484375[6] 0.0009765625

Page 23: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

MakingthePMF## data frame (tibble) of probabilities for PMF> data.pmf <- tibble(k = 0:5, prob = c(0.236623, 0.396, 0.264, 0.0879, 0.0145, 0.000977))> data.pmf

# A tibble: 6 x 2k prob

<int> <dbl>1 0 0.2366232 1 0.3960003 2 0.2640004 3 0.0879005 4 0.0145006 5 0.000977

## Equivalent> data.pmf <- tibble(k = 0:5, prob = map_dbl(0:5, dbinom, 5, 0.25))

Page 24: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

MakingthePMFusesadifferent*stat*> ggplot(data.pmf, aes(x = k, y=prob))+ geom_bar( stat="identity" ) +

xlab("Number of kids") + ylab("Probability Type O")

0.0

0.1

0.2

0.3

0.4

0 2 4Number of kids

Prob

abilit

y Ty

pe O

Page 25: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Tweakingthex-axis> ggplot(data.pmf, aes(x = k, y=prob))+ geom_bar( stat="identity" ) +

ylab("Probability Type O") + scale_x_continuous(name = "Number of kids", breaks = 0:5)

0.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5Number of kids

Prob

abilit

y Ty

pe O

Page 26: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Addingsometext> ggplot(data.pmf, aes(x = k, y=prob))+ geom_bar( stat="identity" ) +

ylab("Probability Type O") + scale_x_continuous(name = "Number of kids", breaks = 0:5) +geom_text(aes(x = k, y= prob + 0.01, label = prob))

0.236623

0.396

0.264

0.0879

0.0145 0.0009770.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5Number of kids

Prob

abilit

y Ty

pe O

Page 27: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

VisualizetheCDF> binom.sample <- tibble(x = rbinom(1000, 5, 0.25))> ggplot(binom.sample, aes(x=x)) + stat_ecdf() +

xlab("# Type O kids") + ylab("Cumulative probability")

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4 5# Type O kids

Cum

ulat

ive p

roba

bility

Page 28: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

SolvingforprobabilitiesEachchildborntoaparticularsetofparentshasa25%probabilityofhavingbloodtypeO.Assumetheparentshadfivechildren.B(5,0.25)

Whatistheprobabilitythatexactly2childrenwereTypeO?> dbinom(2, 5, 0.25)

[1] 0.2636719

0.2373046875

0.3955078125

0.263671875

0.087890625

0.01464843750.00097656250.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5Number of kids

Prob

abilit

y Ty

pe O

Page 29: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

SolvingforprobabilitiesEachchildborntoaparticularsetofparentshasa25%probabilityofhavingbloodtypeO.Assumetheparentshadfivechildren.B(5,0.25)

Whatistheprobabilitythatexactly2childrenwereTypeO?

𝑃 𝑋 = 2 = I7 0.25

70.75(I>7)

=10*0.0625*0.422=0.26375

𝑃 𝑋 = 𝑘 = 𝑛𝑘 𝑝=(1 − 𝑝)(<>=)=

𝑛𝑘 𝑝=𝑞(<>=)

0.2373046875

0.3955078125

0.263671875

0.087890625

0.01464843750.00097656250.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5Number of kids

Prob

abilit

y Ty

pe O

Page 30: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

SolvingforprobabilitiesWhatistheprobabilitythat2orfewer childrenwereTypeO?

> pbinom(2, 5, 0.25) [1] 0.8964844

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4 5# Type O kids

Cum

ulat

ive p

roba

bility

B(5,0.25)

Page 31: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

SolvingforprobabilitiesWhatistheprobabilitythat2orfewerchildrenwereTypeO?

> dbinom(0, 5, 0.25) + dbinom(1, 5, 0.25) + dbinom(2, 5, 0.25)

[1] 0.8964844

B(5,0.25)

0.2373046875

0.3955078125

0.263671875

0.087890625

0.01464843750.00097656250.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5Number of kids

Prob

abilit

y Ty

pe O

Page 32: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

SolvingforprobabilitiesWhatistheprobabilitythatmorethan2children(ie either3,4,or5)wereTypeO?

> 1 - pbinom(2, 5, 0.25) [1] 0.1035156

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4 5# Type O kids

Cum

ulat

ive p

roba

bility

B(5,0.25)

Page 33: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

SolvingforprobabilitiesWhatistheprobabilitythatmorethan2children(ie either3,4,or5)wereTypeO?

> dbinom(3, 5, 0.25) + dbinom(4, 5, 0.25) + dbinom(5, 5, 0.25)

[1] 0.1035156

B(5,0.25)

0.2373046875

0.3955078125

0.263671875

0.087890625

0.01464843750.00097656250.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5Number of kids

Prob

abilit

y Ty

pe O

Page 34: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

BREATHE

Page 35: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

ExpressingcontinuousrandomvariablesProbabilitydensityfunction(PDF)◦ Describesthevaluestakenbyacontinuousr.v.X anditsassociatedprobabilities

◦ Functionsuchthatthearea underthecurvebetweenanytwopointsa,bcorrespondstotheprobabilitythatther.v.fallsbetweena,b

◦à 𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = ∫ 𝑓 𝑥 𝑑𝑥QR

Page 36: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PDF

Prob

abili

ty d

ensi

ty

Y

a b

Prob

abili

ty d

ensi

ty

Y

a b

𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = S 𝑓 𝑥 𝑑𝑥Q

R

Page 37: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PDFpropertiesContinuousr.v.'s areinfinitelyprecise:𝑃 𝑋 = 𝑥) = 𝑃(𝑥 ≤ 𝑋 ≤ 𝑥 = 0◦ ExactlyunlikePMFs

TotalareaunderthePDFequals1: ∫ 𝑓 𝑥 𝑑𝑥 = 1T>T

Probabilitiesaren'tnegative:𝑓 𝑥 ≥ 0

Page 38: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

ExpressingcontinuousrandomvariablesCumulativedistributionfunction(CDF)◦ Functiondefined,foraspecificvaluex ofacontinuousr.v.X,asF(x) = P(X ≤ x)◦ (mostly)thesameasfordiscrete

0.00

0.25

0.50

0.75

1.00

0.0 2.5 5.0 7.5 10.0x

y

0.00

0.25

0.50

0.75

1.00

5 10 15 20 25x

y

Page 39: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

RelationshipbetweenPDFandCDF

Page 40: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Jumpingrightin:NormaldistributionThePDF(probabilitydistribution)foranormally-distributedrandomvariable:

𝑓 𝑥 = V7WXY� 𝑒𝑥𝑝 >([>\)Y

7XY

Wewritenormallydistributedr.v.'s as𝑋~𝑁(𝜇, 𝜎7)

It'sgross,everyoneknowsit,andyouwillbeneitherpluggingnorchuggingwiththisequation

Page 41: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PDFofnormaldistributionExample,let'ssaywomen'sheights(cm)arenormallydistributedaccordingto𝑁(165, 64)◦ Popquiz:whatisthestandarddeviationofthisdistribution?

0.00

0.01

0.02

0.03

0.04

0.05

140 160 180Value

Probability

Page 42: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Wikipediaweighsin

Page 43: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

MakingthePDFAnother"interesting"hack:

0.00

0.01

0.02

0.03

0.04

0.05

140 160 180Value

Probability

𝑁(165, 64)

> plot.range <- tibble(x = c(165 - 32, 165 + 32)) > ggplot(plot.range, aes(x=x)) +

stat_function(fun = dnorm, args=list(mean=164, sd=8))

Page 44: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

MakingtheCDF> data.cdf <- tibble(x = rnorm(10000, 164, 8))> ggplot(data.cdf, aes(x=x)) + stat_ecdf()

0.00

0.25

0.50

0.75

1.00

140 160 180x

y

Page 45: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

ExpectationandvarianceAnyguesses?

It'sinthedefinition:𝑋~𝑁(𝜇, 𝜎7)

Page 46: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Typesofquestionsonecanask:◦ Whatistheprobabilitythatarandomly-chosenwomanistallerthan158cm?◦ Whatistheprobabilitythatarandomly-chosenwomanisbetween163—170cmtall?

◦ Whatistheprobabilitythatarandomly-chosenwomanisshorterthan167cm?

◦ Whatistheprobabilitythatarandomly-chosenwomanis168cmtall?

Workingwiththenormaldistribution

Page 47: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Typesofquestionsonecanask:◦ Whatistheprobabilitythatarandomly-chosenwomanistallerthan158cm?◦ Whatistheprobabilitythatarandomly-chosenwomanisbetween163—170cmtall?

◦ Whatistheprobabilitythatarandomly-chosenwomanisshorterthan167cm?

◦ Whatistheprobabilitythatarandomly-chosenwomanis168cmtall?

Workingwiththenormaldistribution

Page 48: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PropertiesofthenormaldistributionSymmetricaroundthemean

Mean=median=mode

Inflectionpoints

Page 49: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Introducingthestandardnormal:𝑋~𝑁(0,1)

0.0

0.1

0.2

0.3

0.4

−5 −4 −3 −2 −1 0 1 2 3 4 5x

Probability

𝜇 =0

σ =1

TheseareZ-scores

Page 50: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

StandardNormal𝑋~𝑁(0,1)

The cumulative-distribution function (cdf) for a standard normal distribution is denoted by

( ) ( )x X xPr

where X follows an N(0,1) distribution. This function is shown in Figure 5.10.

The symbol is used as shorthand for the phrase “is distributed as.” Thus X ~ N(0,1) means that the random variable X is distributed as an N(0,1) distribution.

Column A in Table 3 of the Appendix presents (x) for various positive values of x for a standard normal distribution. This cumulative distribution function is illus-trated in Figure 5.11. Notice that the area to the left of 0 is .5.

0.4

0.3

0.2

0.1

f(x)

0.0

x0

f(x) = 12

e–(1/2)x2

0.4

0.3

0.2

0.1

f(x)

0.0

x0–1.00–1.96–2.58 1.00 1.96 2.58

68% of area95% of area99% of area

CHE-ROSNER-10-0205-005.indd 115 7/15/10 7:50:47 PM

Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.

Page 51: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PDFandCDFof𝑋~𝑁(0,1)

Furthermore, the area to the left of x approaches 0 as x becomes small and ap-proaches 1 as x becomes large.

The right-hand tail of the standard normal distribution Pr(X x) is given in column B of Appendix Table 3.

If X ~ N(0,1), then find Pr(X 1.96) and Pr(X 1).

From the Appendix, Table 3, column A,

(1.96) .975 and (1) .8413

From the symmetry properties of the standard normal distribution,

( ) ( ) ( ) ( ) ( )x X x X x X x xPr Pr Pr1 1

This symmetry property is depicted in Figure 5.12 for x 1.

0.4

0.3

0.2

0.1

0.0x0

Pr(X x) = (x) =area to the left of x

f(x)

(x)

x–3 –2 –1 0

.0013

1 2 3

1.0

0.5

0.0.023

.16

.50

.84

.977 .9987

CHE-ROSNER-10-0205-005.indd 116 7/15/10 7:50:49 PM

Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.

Furthermore, the area to the left of x approaches 0 as x becomes small and ap-proaches 1 as x becomes large.

The right-hand tail of the standard normal distribution Pr(X x) is given in column B of Appendix Table 3.

If X ~ N(0,1), then find Pr(X 1.96) and Pr(X 1).

From the Appendix, Table 3, column A,

(1.96) .975 and (1) .8413

From the symmetry properties of the standard normal distribution,

( ) ( ) ( ) ( ) ( )x X x X x X x xPr Pr Pr1 1

This symmetry property is depicted in Figure 5.12 for x 1.

0.4

0.3

0.2

0.1

0.0x0

Pr(X x) = (x) =area to the left of x

f(x)

(x)

x–3 –2 –1 0

.0013

1 2 3

1.0

0.5

0.0.023

.16

.50

.84

.977 .9987

CHE-ROSNER-10-0205-005.indd 116 7/15/10 7:50:49 PM

Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.

Iftheshadedgreyarea=0.977,whatisx?

Page 52: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

StandardNormal𝑋~𝑁(0,1)Duetosymmetry,P(X≤-x)=1- P(X≤ x)

Calculate Pr(X 1.96) if X ~ N(0,1).

Pr(X 1.96) Pr(X 1.96) .0250 from column B of Table 3.

Furthermore, for any numbers a, b we have Pr(a X b) Pr(X b) Pr(X a) and thus we can evaluate Pr(a X b) for any a, b from Table 3.

Compute Pr( 1 X 1.5) if X ~ N(0,1).

Pr( 1 X 1.5) Pr(X 1.5) Pr(X 1)

Pr(X 1.5) Pr(X 1) .9332 .1587

.7745

Forced vital capacity (FVC), a standard measure of pulmonary function, is the volume of air a person can expel in 6 seconds. Current research looks at potential risk factors, such as cigarette smoking, air pollution, indoor al-lergies, or the type of stove used in the home, that may affect FVC in grade-school children. One problem is that age, sex, and height affect pulmonary function, and these variables must be corrected for before considering other risk factors. One way to make these adjustments for a particular child is to find the mean and standard deviation for children of the same age (in 1-year age groups), sex, and height (in 2-in. height groups) from large national surveys and compute a standardized FVC, which is defined as ( )X , where X is the original FVC. The standardized FVC then approximately follows an N(0,1) distribution, if the distribution of the original FVC values was bell-shaped. Suppose a child is considered in poor pulmonary health if his or her standardized FVC 1.5. What percentage of children are in poor pul-monary health?

Pr(X 1.5) Pr(X 1.5) .0668

Thus about 7% of children are in poor pulmonary health.

0.4

0.3

0.2

0.1

0.0

x0–1 1

1 – (1)(–1)

f(x)

CHE-ROSNER-10-0205-005.indd 117 7/15/10 7:50:50 PM

Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.

Page 53: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

For𝑋~𝑁(0,1),whatistheprobabilityP(X≤0.47)?

0.0

0.1

0.2

0.3

0.4

−5 −4 −3 −2 −1 0 1 2 3 4 5Z

Probability

0.47

# CDF: P(X <= 0.47)> pnorm(0.47)

[1] 0.6808225

Page 54: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Normaldistributionfunctions

Normalfunction Meaning

dnorm(x) Density atX=x

pnorm(q) P(X<= x)

rnorm(n) Generate nrandomdrawsfromN(0,1)

qnorm(p) ObtainxfromgivenCDFarea:qnorm(0.6808225) = 0.47

0.0

0.1

0.2

0.3

0.4

−5 −4 −3 −2 −1 0 1 2 3 4 5Z

Probability

0.47

Page 55: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

For𝑋~𝑁(0,1),whatistheprobabilityP(-1.32≤x≤0.47)?

0.0

0.1

0.2

0.3

0.4

−5 −4 −3 −2 −1 0 1 2 3 4 5Z

Probability

0.47-1.32

Page 56: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

For𝑋~𝑁(0,1),whatistheprobabilityP(-1.32≤x≤0.47)?

0.0

0.1

0.2

0.3

0.4

−5 −4 −3 −2 −1 0 1 2 3 4 5Z

Probability

0.47-1.32

# P(X <= 0.47)> pnorm(0.47)

[1] 0.6808225

0.0

0.1

0.2

0.3

0.4

−5 −4 −3 −2 −1 0 1 2 3 4 5Z

Probability

-1.32

0.0

0.1

0.2

0.3

0.4

−5 −4 −3 −2 −1 0 1 2 3 4 5Z

Probability

0.47

= -

# P(X <= -1.32)> pnorm(-1.32)

[1] 0.093417510.587405

Page 57: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

For𝑋~𝑁(0,1),whatistheprobabilityP(-1≤x≤1)?AKAprobabilityofbeingwithin1standarddeviationofmean?

~0.68

Page 58: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

For𝑋~𝑁(0,1),whatistheprobabilityP(x≥ 2.14)?

## Two approaches:

> 1 - pnorm(2.14)[1] 0.01617738

> pnorm(-2.14)[1] 0.01617738

0.0

0.1

0.2

0.3

0.4

−5 −4 −3 −2 −1 0 1 2 3 4 5Z

Probability

2.14

Page 59: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

For𝑋~𝑁(0,1),thetop8%ofthedistributionfallsabovewhatnumber?

0.0

0.1

0.2

0.3

0.4

−5 −4 −3 −2 −1 0 1 2 3 4 5Z

Probability

Area=0.08

???

> qnorm(1 - 0.08)[1] 1.405072

> -1 * qnorm(0.08)[1] 1.405072

Page 60: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Historicalconsiderationofz-scores

Page 61: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Re-scalingtostandardnormaltocomparedistributions𝑍 = [>\

X◦x=distributionvalueofinterest("rawscore")◦𝜇,𝜎 =r.v./populationmean,standarddeviation

Page 62: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Example:Weightforapopulationofrabbitsfollowsanormaldistribution𝑁(2.6, 1.1)

WhatistheZ-scorefora3poundrabbit?

𝑍 = [>\X

=a>7.bV.V� =0.381

Whatisprobabilityarabbitweighslessthan3pounds?pnorm(0.381) = 0.648

Doesismakesensethatthisnumberispositive?

pnorm(3, 2.6, sqrt(1.1)) = 0.648 THEFUTUREISNOW

Page 63: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Normaldistributionsfunctions,revisitedAllfunctionsassumestandardnormal.Provideadditionalargumentsforothernormals:

Standardnormal Anynormalpnorm(q) = pnorm(q, 0, 1) pnorm(q, mean, sd)

Page 64: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Z-scoresaremostusefulforcomparingdifferentdistributionsWeightforrabbitpopAisdistributed𝑁(2.6, 1.1)

WeightforrabbitpopBisdistributed𝑁(2.9, 0.17)

Whichofthesetworabbitsisbigger?PopArabbitweighting2.95lbs,orpopBrabbitweighing3.1lbs?

PopulationA:𝑍 = [>\X= 7.dI>7.b

V.V� = 0.334

PopulationB:𝑍 = [>\X= a.V>7.d

f.Vg� = 0.485

Page 65: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PuttingitalltogetherTheheightofEuropeanmenisdistributedas𝑁 175, 53.3

TheheightofEuropeanwomenisdistributedas𝑁(162.5, 34.8)

Whatproportionofmenisshorterthan150cm,akaP(man<150)?UsingZ-scores

𝑍 = [>\X= VIf>VgI

Ia.a� = -3.424

> pnorm(-3.424)[1] 0.0003085331

SkippingZ-scores

> pnorm(150, 175, sqrt(53.3))[1] 0.0003081516

Page 66: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PuttingitalltogetherWhatproportionofwomenistallerthan162.5cm?

Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)

50%

Page 67: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PuttingitalltogetherWhatproportionofwomenistallerthan170cm?

Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)

UsingZ-scores

𝑍 = [>\X= Vgf>Vb7.I

ai.j� = 1.2713

> 1 - pnorm(1.2713)[1] 0.101811

SkippingZ-scores

> 1 - pnorm(170, 162.5, sqrt(34.8))[1] 0.1017987

Page 68: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PuttingitalltogetherWhatisthetallestawomancanbeandstillbeinthebottom22%?

Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)

UsingZ-scores

> qnorm(0.22)[1] -0.7721932

𝑍 = [>\X à x = 𝑍𝜎 + 𝜇

= −0.7722 ∗ 34.8� + 162.5= 𝟏𝟓𝟕. 𝟗𝒄𝒎

SkippingZ-scores

> qnorm(0.22, 162.5, sqrt(34.8))[1] 157.9447

Page 69: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PuttingitalltogetherWhatistheshortest awomancanbeandstillbeinthetop 22%?

Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)

UsingZ-scores

> -1 * qnorm(0.22)[1] 0.7721932

𝑍 = [>\X à x = 𝑍𝜎 + 𝜇

= 0.7722 ∗ 34.8� + 162.5= 𝟏𝟔𝟕. 𝟎𝟓𝒄𝒎

SkippingZ-scores

> qnorm(1-0.22, 162.5, sqrt(34.8))[1] 167.0553

Page 70: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PuttingitalltogetherWhatistheprobabilityarandomlychosenmanisbetween175–182cmtall?

à P(X<182)– P(X<175)=P(X<182)– 0.5

Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)

> pnorm(182, 175, sqrt(53.3)) – 0.5[1] 0.3311738

Page 71: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PuttingitalltogetherWhatistheprobabilityarandomlychosenmaniseitherbetween175–182cmtallor between150—160cmtall?

à P(175<X<182)+P(150<X<160)

Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)

### First probability> pnorm(182, 175, sqrt(53.3)) – 0.5[1] 0.3311738

> ### Second prob.> pnorm(160, 175, sqrt(53.3)) – pnorm(150, 175, sqrt(53.3))[1] 0.01965059

> 0.3311738 + 0.01965059[1] 0.3508244

Page 72: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PuttingitalltogetherIhavetworandomly-chosenEuropeanfriends,onemanandonewomaneach.Whatistheprobabilitythemanisatleast180cmandthewomanisbetween163—170cm?

à P(man>180)xP(163<woman<170)

Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)

### First probability> 1 - pnorm(180, 175, sqrt(53.3))

[1] 0.2467138

> ### Second prob.> pnorm(170, 162.5, sqrt(34.8)) – pnorm(163, 162.5, sqrt(34.8))

[1] 0.3644282

> 0.246713*0.3644282[1] 0.08990917

Page 73: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PuttingitalltogetherIhavetwonewrandomly-chosenEuropeanfriends,onemanandonewomaneach.Whatistheprobabilitythemanis180cmandthewomanis163cm?

à P(man=180)xP(woman=163)

à 0

Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)

Page 74: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PuttingitalltogetherAssume50.8%ofEuropeansarewomen.Ifarandomly-chosenpersonisshorterthan155cmtall,whatistheprobabilitythepersonisawoman?

à P(woman|<155)=

Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)

### P(<155 | woman)> pnorm(155, 162.5, sqrt(34.8))

[1] 0.1017987

P(<155|woman)*P(woman)/P(<155)

0.102 0.508

Page 75: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

SolvingthedenominatorP(<155)=P(<155andman)+P(<155andwoman)=

P(<155|man)*P(man)+P(<155|woman)*P(woman)

### P(<155 | man) > pnorm(155, 175, sqrt(53.3))

[1] 0.003076926

0.0031 0.492 0.102 0.508

=0.0533

Page 76: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

PuttingitalltogetherAssume50.8%ofEuropeansarewomen.Ifarandomly-chosenpersonisshorterthan155cm,whatistheprobabilitythepersonisawoman?

à P(woman|<155)=

Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)

P(<155|woman)*P(woman)/P(<155)

0.102 0.508 0.533

=0.972

Page 77: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

BREAK

Page 78: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

StatisticalinferencePopulation Sample

Random sampling

Statistical inferencePopulation parameters !, "

Sample estimates x, s

Page 79: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

TwomainflavorsofstatisticalinferenceEstimation◦ Estimateapopulationparameterfromsampledata◦ Pointestimates:Whatisthepopulationmean?◦ Intervalestimates:Inwhatrangeofvaluesisthepopulationmeanlikelytofall?

Hypothesistesting◦ Testwhetherthevalueofapopulationparameterisequaltosomespecificvalue◦ Isthereevidencethatmysamplediffersfromsomeunderlyingpopulation?

Page 80: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

ThesamplingdistributionTheprobabilitydistributionofvaluesforanestimatethatweobtainundersampling

Page 81: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Obtainingasamplingdistribution

0

1000

2000

3000

0 5000 10000 15000nucleotides

count

> genes <- read.csv("genes.csv")> head(genes)

nucleotides1 37852 74163 21354 76825 57666 11079

> mean(genes$nucleotides)[1] 2761.039 > sd(genes$nucleotides)[1] 2037.645

> ggplot(genes, aes(x=nucleotides)) + geom_histogram(fill="white", color="black")

Page 82: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Obtainingasamplingdistribution### the function sample_n draws a random sample of rows

> small.sample <- genes %>% sample_n(25) > mean(small.sample$nucleotides)

[1] 2151.8

> ggplot(small.sample , aes(x = nucleotides)) + geom_histogram() +geom_vline(xintercept=2151.8, color="blue") + geom_vline(xintercept= 2761.039, color="red")

0

1

2

3

0 1000 2000 3000 4000nucleotides

count

geom_vline(xintercept=…)geom_hline(yintercept=…)geom_abline(yintercept=…, slope=…)

ThesamplemeanforarandomsampleofN=25is𝒙w = 𝟐𝟏𝟓𝟏. 𝟖

Page 83: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

ObtainingasamplingdistributionNowimaginewedraw20samplesofN=25andcomputeeachoftheirmeans:> head(n20.means)sample.mean

1 2584.842 2574.123 2382.644 3143.685 2252.566 2368.44

Samplingdistributionofthemean

0

1

2

3

2000 2500 3000 3500sample.mean

count

Page 84: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

QuantifyingthesamplingdistributionThestandarderror isthestandarddeviationoftheestimateofthesamplingdistribution◦ Standarderrorofthemean:𝑆𝐸[̅ =

X<�,approximatewith }

<�

◦ SEisnotthestandarddeviationofasample◦ Here,nrepresentsthenumberofsamples (not thesamplesize)

Italsoquantifiestheprecisionofourestimate,i.e.howfarfromthepopulationparameterweare

Page 85: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Computingthestandarderrorofthemean

> head(n20.means)sample.mean

1 2584.842 2574.123 2382.644 3143.685 2252.566 2368.44

> sd(n20.means$sample.mean) / sqrt(20)[1] 93.11888

0

1

2

3

2000 2500 3000 3500sample.mean

count

Samplingdistributionofthemean

Page 86: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

SeveralsamplingdistributionscomprisedofNsamples,eachofn=25

0

1

2

3

2000 2500 3000 3500sample.mean

count

0

1

2

3

4

5

2000 2500 3000 3500 4000sample.mean

count

0

25

50

75

100

2000 3000 4000sample.mean

count

0

300

600

900

1200

2000 3000 4000sample.mean

count

N=20 N=50 N=100 N=1000 N=10000

0.0

2.5

5.0

7.5

10.0

12.5

2000 2500 3000 3500 4000sample.mean

count

Page 87: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

StandarderrordecreasesasNincreases

0

1

2

3

2000 2500 3000 3500sample.mean

count

0

1

2

3

4

5

2000 2500 3000 3500 4000sample.mean

count

0

25

50

75

100

2000 3000 4000sample.mean

count

0

300

600

900

1200

2000 3000 4000sample.mean

count

N=20 N=50 N=100 N=1000 N=10000

SE=93.1 SE=58.1 SE=37.9 SE=13.3 SE=4.02

0.0

2.5

5.0

7.5

10.0

12.5

2000 2500 3000 3500 4000sample.mean

count

Page 88: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Therefore,meanofsamplingdistributionapproachespopulationmean 2761.039

0

1

2

3

2000 2500 3000 3500sample.mean

count

0

1

2

3

4

5

2000 2500 3000 3500 4000sample.mean

count

0

25

50

75

100

2000 3000 4000sample.mean

count

0

300

600

900

1200

2000 3000 4000sample.mean

count

N=20 N=50 N=100 N=1000 N=10000

SE=93.1 SE=58.1 SE=37.9 SE=13.3 SE=4.02𝒙w=2780.89 𝒙w=2753.91 𝒙w=2781.51 𝒙w=2777.02 𝒙w=2763.82

0.0

2.5

5.0

7.5

10.0

12.5

2000 2500 3000 3500 4000sample.mean

count

Page 89: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

TheCentralLimitTheoremAssamplesizeincreases,thesamplingdistributionofthemean willbeapproximatelynormalregardlessoftruepopulationdistribution

0

300

600

900

1200

2000 3000 4000sample.mean

count

0

1000

2000

3000

0 5000 10000 15000nucleotides

count

Populationdistribution N=1e4samplingdistribution

Page 90: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312

Nextweek..Introductiontohypothesistestingandcomparingmeans

Morefunfactsonestimationwillcomelaterinthesemester,tobebundledwith*likelihood*