28
SUMMARY

summary

  • Upload
    prisca

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

summary. Z-distribution Central limit theorem. Sweet demonstration of the sampling distribution of the mean. Sweet data. R-code – sampling distribution exact. data.set

Citation preview

Page 1: summary

SUMMARY

Page 2: summary

• Z-distribution• Central limit theorem

Page 3: summary

Sweet demonstration of the sampling distribution of the mean

Page 4: summary

Sweet data

𝑛=20

Page 5: summary

R-code – sampling distribution exactdata.set <- c(6,4,5,3,10,3,5,3,6,5,4,8,7,2,8,5,8,5,4,0)

mean(data.set)

sd(data.set)*sqrt(19/20) #standard deviation

(sd(data.set)*sqrt(19/20))/sqrt(20) sample_size<-5

samps <- combn(data.set, sample_size)

xbars <- colMeans(samps)

barplot(table(xbars))

Page 6: summary

Sampling distribution – exact

𝜇𝑥=𝑀=??

𝑀=𝜇=5.05

Page 7: summary

R-code (sampling distribution simulated)

data.set <- c(6,4,5,3,10,3,5,3,6,5,4,8,7,2,8,5,8,5,4,0)

sample_size<-3

number_of_samples<-20

samples <- replicate(number_of_samples,sample(data.set, sample_size, replace=T)); out<-colMeans(samples); mean(out); sd(out)

barplot(table(out))

Page 8: summary

Sampling distribution – simulated

Page 9: summary

Sampling distribution – simulated

Page 10: summary

ESTIMATION

Page 11: summary

Statistical inference

If we can’t conduct a census, we collect data from the sample of a population.

Goal: make conclusions about that population

Page 12: summary

Demonstration problem• You sample 36 apples from your farm’s harvest of over

200 000 apples. The mean weight of the sample is 112 grams (with a 40 gram sample standard deviation).

• What is the probability that the mean weight of all 200 000 apples is within 100 and 124 grams?

Page 13: summary

What is the question?• We would like to know the probability that the population

mean is within 12 of the sample mean.

• But this is the same thing as

• But this is the same thing as

• So, if I am able to say how many standard deviations away from I am, I can use the Z-table to figure out the probability.

Page 14: summary

Slight complication• There is one caveat, can you see it?• We don’t know a standard deviation of a sampling

distribution (standard error). We only know it equals to , but is uknown.

• What we’re going to do is to estimate . Best thing we can use is a sample standard deviation , that equals to 40.

• . This is our best estimate of a standard error.• Now you finish the example. What is the probability that

population mean lies within 12 of the sample if the SE equals to 6.67?• 92.82%

Page 15: summary

This is neat!• You sample 36 apples from your farm’s harvest of over

200 000 apples. The mean weight of the sample is 112 grams (with a 40 gram sample standard deviation). What is the probability that the population mean weight of all 200 000 apples is within 100 and 124 grams?

• We started with very little information (we know just the sample statistics), but we can infere that

with the probability of 92.82% a population mean lies within 12 of our sample mean!

Page 16: summary

Point vs. interval estimate• You sample 36 apples from your farm’s harvest of over

200 000 apples. The mean weight of the sample is 112 grams (with a 40 gram sample standard deviation).

• Goal: estimate a population mean1. A population mean is estimated as a sample mean. i.e.

we say a population mean equals to 112 g. This is called a point estimate (bodový odhad).

2. However, we can do better. We can estimate, that our true population mean will lie with the 95% confidence within an interval of (interval estimate).

𝑥±1.96× 𝑠√𝑛

Page 17: summary

Confidence interval• This type of result is called a confidence interval

(interval spolehlivosti, konfidenční interval).

• The number of stadandard errors you want to add/subtract depends on the confidence level (e.g. 95%) (hladina spolehlivosti).

𝑥±𝑍× 𝑠√𝑛

margin of errormožná odchylka

critical valuekritická hodnota

Page 18: summary

Confidence level• The desired level of confidence is set by the researcher

(not determined by data).• If you want to be 95% confident with your results, you add/subtract

1.96 standard errors (empirical rule says about 2 standard errors).• 95% interval spolehlivosti

Confidence level Z-value80 1.2890 1.6495 1.9698 2.3399 2.58

Page 19: summary

80% 90%

95% 99%

1.28

1.96

1.64

2.58

Page 20: summary

Small sample size confidence intervals

• 7 patient’s blood pressure have been measured after having been given a new drug for 3 months. They had blood pressure increases of 1.5, 2.9, 0.9, 3.9, 3.2, 2.1 and 1.9. Construct a 95% confidence interval for the true expected blood pressure increase for all patients in a population.

Page 21: summary

CLT consequence• Change in a blood pressure is a biological process. It’s

going to be a sum of thousands or millions of microscopic processes.

• Generally, if we think about biological/physical process, they can be viewed as being affected by a large number of random subprocesses with individually small effects.

• The sum of all these random components creates a random variable that converges to a normal distribution regardless of the underlying distribution of processes causing the small effects.

• Thus, the Central Limit Theorem explains the ubiquity of the famous "Normal distribution" in the measurements domain.

Page 22: summary

• We will assume that our population distribution is normal, with and .

• We don’t know anything about this distribution but we have a sample. Let’s figure out everything you can figure out about this sample: • ,

• We’ve been estimating the true population standard deviation with our sample standard deviation

• However, we are estimating our standard deviation with of only ! This is probably goint to be not so good estimate.

• In general, if this is considered a bad estimate.

Page 23: summary

William Sealy Gosset aka Student• 1876-1937• an employee of Guinness

brewery• 1908 papers addressed the

brewer's concern with small samples• "The probable error of a mean".

Biometrika 6 (1): 1–25. March 1908.• Probable error of a correlation

coefficient". Biometrika 6 (2/3): 302–310. September 1908.

Page 24: summary

Student t-distribution• Instead of assuming a sampling distribution is normal we

will use a Student t-distribution.• It gives a better estimate of your confidence interval if you

have a small sample size.• It looks very similar to a normal distribution, but it has

fatter tails to indicate the higher frequency of outliers which come with a small data set.

Page 25: summary

Student t-distribution

Page 26: summary

Student t-distribution

df – degree of freedom (stupeň volnosti)

Page 27: summary

Back to our case

• Because a sample size is small, sampling distribution of the mean won’t be normal. Instead, it will have a Student t-distribution with .

• Construct a 95% confidence interval, please

for𝑛<30 :𝑥 ±𝑡𝑛−1×𝑠

√𝑛

Page 28: summary

• Just to summarize, the margin of error depends on1. the confidence level (common is 95%)2. the sample size

• as the sample size increases, the margin of error decreases• For the bigger sample we have a smaller interval for which

we’re pretty sure the true population lies.

3. the variability of the data (i.e. on σ)• more variability increases the margin of error

• Margin of error does not measure anything else than chance variation.

• It doesn’t measure any bias or errors that happen during the proces.

• It does not tell anything about the correctness of your data!!!

neco× 𝑠√𝑛